CN112712015A

CN112712015A - Human body key point identification method and device, intelligent terminal and storage medium

Info

Publication number: CN112712015A
Application number: CN202011595172.XA
Authority: CN
Inventors: 曹晟; 言宏亮; 伍广彬; 卢瑶; 钟浩; 于波; 张华�; 杨波; 梁兴伟; 杨卫国
Original assignee: Shenzhen Geling Institute Of Artificial Intelligence And Robotics Co ltd; Shenzhen Hit Technology Innovation Industry Development Co ltd; Konka Group Co Ltd; Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Geling Institute Of Artificial Intelligence And Robotics Co ltd; Shenzhen Hit Technology Innovation Industry Development Co ltd; Konka Group Co Ltd; Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2021-04-27
Anticipated expiration: 2040-12-28

Abstract

The invention discloses a human body key point identification method, a human body key point identification device, an intelligent terminal and a storage medium, wherein the human body key point identification method comprises the following steps: acquiring a human body image to be detected; acquiring a compressed high-resolution characteristic network model; detecting the human body image to be detected based on the compressed high-resolution characteristic network model, and acquiring the position information of the human body key points in the human body image to be detected; and outputting the position information of the human body key points. Compared with the prior art, the human body key point identification method provided by the scheme of the invention has the advantages that the existing high-resolution characteristic network model is compressed, and the position information of the human body key point is detected and obtained on the basis of the compressed high-resolution characteristic network model, so that the parameter quantity and the calculated quantity are reduced, the prediction time is shortened, and the model is transplanted to embedded platforms such as a mobile terminal for use.

Description

Human body key point identification method and device, intelligent terminal and storage medium

Technical Field

The invention relates to the technical field of deep neural networks, in particular to a human body key point identification method, a human body key point identification device, an intelligent terminal and a storage medium.

Background

With the development of computer technology, key points of a human body can be currently detected from an image including the human body. Human body key point detection, also known as human body pose estimation, requires locating human body key parts, such as head, neck, shoulders, hands, etc., in a given picture. Furthermore, with the development of the demand of people, the task of detecting key points of the human body at present not only needs to acquire high-level semantic information containing human body texture information, but also needs to collect the association degree of a detection target and the surrounding environment, so that a better identification effect is achieved.

In the prior art, human body key point detection is generally performed based on a deep neural network model, for example, human body key point detection is performed based on a High-Resolution feature network (HRNet) model. HRNet draws back and receives attention by itself, and on the human key point detection task, the problems of personnel shielding, high difficulty in motion recognition and the like can be solved by using the high-resolution characteristics of the HRNet. The problem in the prior art is that the deep neural network model represented by the HRNet model generally has large parameter and large calculation amount, so that the prediction is long in time consumption and is not beneficial to being transplanted to embedded platforms such as a mobile terminal for use.

Thus, there is still a need for improvement and development of the prior art.

Disclosure of Invention

The invention mainly aims to provide a human body key point identification method, a human body key point identification device, an intelligent terminal and a storage medium, and aims to solve the problems that in the prior art, a deep neural network model represented by an HRNet model is large in parameter quantity, large in calculated quantity, long in prediction time consumption and not beneficial to being transplanted to embedded platforms such as a mobile terminal for use.

In order to achieve the above object, a first aspect of the present invention provides a method for identifying key points of a human body, wherein the method comprises:

acquiring a human body image to be detected;

acquiring a compressed high-resolution characteristic network model;

detecting the human body image to be detected based on the compressed high-resolution characteristic network model, and acquiring the position information of the human body key points in the human body image to be detected;

and outputting the position information of the human body key points.

Optionally, the obtaining of the compressed high-resolution feature network model includes:

acquiring a high-resolution characteristic network model;

compressing the basic module of the high-resolution characteristic network model;

compressing the bottleneck module of the high-resolution characteristic network model;

and acquiring the compressed high-resolution characteristic network model and taking the high-resolution characteristic network model as the compressed high-resolution characteristic network model.

Optionally, the compressing the basic module of the high-resolution feature network model includes:

decomposing the 3 × 3 convolutional layers in the basic module into 1 × 1 convolutional layers, 3 × 3 convolutional layers and 1 × 1 convolutional layers which are connected in sequence;

carrying out convolution kernel parameter sharing on the 3 x 3 convolution layer obtained by decomposition;

the first layer of the 1 × 1 buildup layer and the 3 × 3 buildup layer are shared between two adjacent base modules.

Optionally, the compressing the basic module of the high-resolution feature network model further includes:

the convolution kernels of the 3 × 3 convolution layers shared by the convolution kernel intrinsic parameters and the grouped 1 × 1 convolution kernels not shared by the convolution kernel intrinsic parameters are added, and then convolution operation is performed.

in the even branches of the high-resolution characteristic network model, all channels in the same group are connected with each other during convolution;

in the odd branches of the high-resolution feature network model, the convolution is performed without connection among channels in the same group.

Optionally, the compressing the bottleneck module of the high-resolution feature network model includes:

decomposing the 3 × 3 convolutional layer in the bottleneck module into a 1 × 1 convolutional layer, a 3 × 3 convolutional layer and a 1 × 1 convolutional layer which are connected in sequence;

the first layer of the 1 × 1 convolutional layer and the 3 × 3 convolutional layer are shared between two adjacent bottleneck modules.

A second aspect of the present invention provides a human body key point identification device, wherein the device includes:

the image acquisition module is used for acquiring a human body image to be detected;

the network model acquisition module is used for acquiring a compressed high-resolution characteristic network model;

the detection module is used for detecting the human body image to be detected based on the compressed high-resolution characteristic network model and acquiring the position information of the human body key points in the human body image to be detected;

and the output module is used for outputting the position information of the human body key points.

Optionally, the network model obtaining module includes:

the high-resolution characteristic network model acquisition unit is used for acquiring a high-resolution characteristic network model;

a basic module compression unit, configured to compress a basic module of the high-resolution feature network model;

a bottleneck module compression unit, configured to compress a bottleneck module of the high-resolution feature network model;

and the network model acquisition unit is used for acquiring the compressed high-resolution characteristic network model and taking the high-resolution characteristic network model as the compressed high-resolution characteristic network model.

A third aspect of the present invention provides an intelligent terminal, where the intelligent terminal includes a memory, a processor, and a human key point identification program stored in the memory and executable on the processor, and the human key point identification program implements any one of the steps of the human key point identification method when executed by the processor.

A fourth aspect of the present invention provides a computer-readable storage medium, in which a human key point identification program is stored, and the human key point identification program, when executed by a processor, implements any one of the steps of the human key point identification method.

According to the scheme, the human body image to be detected is obtained; acquiring a compressed high-resolution characteristic network model; detecting the human body image to be detected based on the compressed high-resolution characteristic network model, and acquiring the position information of the human body key points in the human body image to be detected; and outputting the position information of the human body key points. Compared with the prior art, the human body key point identification method provided by the scheme of the invention has the advantages that the existing high-resolution characteristic network model is compressed, and the position information of the human body key point is detected and obtained on the basis of the compressed high-resolution characteristic network model, so that the parameter quantity and the calculated quantity are reduced, the prediction time is shortened, and the model is transplanted to embedded platforms such as a mobile terminal for use.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart of a method for identifying key points of a human body according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a detailed process of step S200 in FIG. 1 according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a network overall framework of an HRNet according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating the step S202 in FIG. 2 according to an embodiment of the present invention;

FIG. 5 is a schematic illustration of compression provided by an embodiment of the present invention;

FIG. 6 is a diagram illustrating the addition of a shared convolution and packet convolution kernel according to an embodiment of the present invention;

fig. 7 is a diagram of a branch network structure according to an embodiment of the present invention;

FIG. 8 is a flowchart illustrating the step S203 in FIG. 2 according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a human body key point identification device according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of the network model obtaining module 920 in fig. 9 according to an embodiment of the present invention;

fig. 11 is a schematic block diagram of an internal structure of an intelligent terminal according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when …" or "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted depending on the context to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings of the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

The human body key point identification technology is an important image information processing technology in the security and protection field and the smart home field, and has attracted more and more attention in recent years. In the process of carrying out the human body key point identification task, not only high-level semantic information containing human body texture information needs to be obtained, but also the correlation degree between a detection target and the surrounding environment needs to be collected, so that a better identification effect is achieved. In the prior art, human body key point detection is usually performed based on a deep neural network model. In the technical field of deep neural networks, the existing artificial marking data can be used for training to obtain the deep neural network with better prediction effect and complete various tasks. However, the deep neural network usually has a large number of parameters and a large amount of calculation, and a model obtained by using the existing network for direct training is difficult to be directly used on embedded platforms such as a mobile terminal, and a model compression task needs to be further performed for a specific deep neural network model. In the prior art, human body key point detection is usually performed based on an HRNet model, and the problems of personnel shielding, high difficulty in motion recognition and the like can be solved by using high-resolution features of the HRNet model, but the parameter quantity and network operation of the whole network are huge due to the introduction of feature maps with different scales. Namely, the HRNet model has the problems of large model parameter quantity and large calculation quantity, so that the prediction time consumption is long and the HRNet model is not beneficial to being transplanted to embedded platforms such as a mobile terminal for use. Therefore, a better method for identifying key points of a human body is needed for a user.

In order to solve the problems in the prior art, the embodiment of the invention provides a method for identifying key points of a human body, and in the embodiment of the invention, an image of the human body to be detected is obtained; acquiring a compressed high-resolution characteristic network model; detecting the human body image to be detected based on the compressed high-resolution characteristic network model, and acquiring the position information of the human body key points in the human body image to be detected; and outputting the position information of the human body key points. Compared with the prior art, the human body key point identification method provided by the scheme of the invention has the advantages that the existing high-resolution characteristic network model is compressed, and the position information of the human body key point is detected and obtained on the basis of the compressed high-resolution characteristic network model, so that the parameter quantity and the calculated quantity are reduced, the prediction time is shortened, and the model is transplanted to embedded platforms such as a mobile terminal for use.

Exemplary method

As shown in fig. 1, an embodiment of the present invention provides a method for identifying key points of a human body, and specifically, the method includes the following steps:

and S100, acquiring a human body image to be detected.

The human body image to be detected is an image needing human body key point identification. Optionally, the human body image to be detected may be acquired by a camera in real time, or may be an image acquired in advance, which depends on actual requirements and specific application scenarios, and is not limited herein.

And step S200, acquiring a compressed high-resolution characteristic network model.

In this embodiment, the compressed high-resolution feature network model is obtained by compressing the HRNet model in the prior art, and specifically, the HRNet model may be compressed by using a scheme of combining local compression and global compression.

And step S300, detecting the human body image to be detected based on the compressed high-resolution characteristic network model, and acquiring the position information of the human body key points in the human body image to be detected.

Specifically, after the compressed high-resolution feature network model is obtained, the model is trained based on a pre-obtained training set to obtain a trained compressed high-resolution feature network model, and then the human body image to be detected is input into the trained compressed high-resolution feature network model for detection, so as to identify and obtain the position information of the human body key point in the human body image to be detected.

And step S400, outputting the position information of the human body key points.

Optionally, the corresponding human body key point position may be marked in the human body image to be detected and output, so as to observe the corresponding human body key point. Of course, other output modes are also possible, and are not specifically limited herein.

As can be seen from the above, the human body key point identification method provided by the embodiment of the invention obtains a human body image to be detected; acquiring a compressed high-resolution characteristic network model; detecting the human body image to be detected based on the compressed high-resolution characteristic network model, and acquiring the position information of the human body key points in the human body image to be detected; and outputting the position information of the human body key points. Compared with the prior art, the human body key point identification method provided by the scheme of the invention has the advantages that the existing high-resolution characteristic network model is compressed, and the position information of the human body key point is detected and obtained on the basis of the compressed high-resolution characteristic network model, so that the parameter quantity and the calculated quantity are reduced, the prediction time is shortened, and the model is transplanted to embedded platforms such as a mobile terminal for use.

Specifically, in this embodiment, as shown in fig. 2, the step S200 includes:

step S201, obtaining a high-resolution characteristic network model.

Step S202, compressing the basic module of the high-resolution characteristic network model.

Step S203, compress the bottleneck module of the high resolution feature network model.

And step S204, acquiring the compressed high-resolution characteristic network model and taking the high-resolution characteristic network model as the compressed high-resolution characteristic network model.

The high-resolution feature network model is an HRNet model in the prior art, and fig. 3 is a schematic diagram of a network overall framework of a corresponding HRNet. As shown in fig. 3, the blocks in fig. 3 represent feature maps (feature maps) processed by the network, that is, an intermediate product after an operation is performed on an input picture, a horizontal arrow represents a convolution operation (conv. Wherein the lateral left-to-right feature passes as a branch. Therefore, the HRNet model in the prior art is compressed, the original prediction accuracy is basically guaranteed, the parameters and the calculated amount of the network are reduced as much as possible, the network reasoning time is shortened, and the model transplantation is facilitated.

Specifically, in this embodiment, as shown in fig. 4, the step S202 includes:

in step S2021, the 3 × 3 convolutional layers in the basic module are decomposed into 1 × 1 convolutional layers, 3 × 3 convolutional layers, and 1 × 1 convolutional layers connected in this order.

In step S2022, convolution kernel parameters are shared with the decomposed 3 × 3 convolutional layer.

In step S2023, the 1 × 1 convolutional layer and the 3 × 3 convolutional layer of the first layer are shared between two adjacent base modules.

The basic module (Basicblock) is a basic component unit of HRNet, and generally consists of two 3 × 3 convolutions in cooperation with corresponding pooling networks. Fig. 5 is a schematic compression diagram provided in this embodiment, and as shown in fig. 5, in this embodiment, the basic module is redesigned through a bottleneck structure in the deep neural network, so as to decompose the 3 × 3 convolutional layer into 1 × 1 convolutional layer-3 × 3 convolutional layer-1 × 1 convolutional layer. Expanding a characteristic map channel by using 1 × 1 point-by-point convolution, then carrying out convolution kernel parameter sharing on channel-by-channel 3 × 3 convolution layers, and finally sharing a first layer of 1 × 1 convolution layers and a 3 × 3 convolution layer in two adjacent basic modules. Therefore, by sharing the convolution kernel internal parameters, the same prediction effect can be achieved by less calculation amount, the calculation amount is reduced, and the prediction time is shortened. In this embodiment, in fig. 5, Shared 1 × 1conv and Shared 3 × 3conv are respectively a 1 × 1 convolutional layer and a 3 × 3 convolutional layer, and their convolution kernel parameters are Shared, while the convolution kernel parameters of the remaining modules in fig. 5 are not Shared. In fig. 5, K is the number of channels of the feature map in the convolution, M is the number of groups, that is, K channels are uniformly divided into M groups, N is the number of groups in a group, that is, each N channel parameters in a group are shared, and P is the number of channels of the input picture (or the feature map input by the upper network). groups are grouped numbers, namely the number of the feature maps on the convolution is divided into groups, and the value of the groups is equal to M; the basic groups are the grouping number, are the same as the groups, are named as the basic groups and are only distinguished from the tiny groups, and the value of the basic groups is equal to M; the tiny groups are the number of groups in a group, that is, the parameters of each tiny group of channels are shared, and the value of the parameters is equal to N.

Further, in this embodiment, the step S202 further includes: the convolution kernels of the 3 × 3 convolution layers shared by the convolution kernel intrinsic parameters and the grouped 1 × 1 convolution kernels not shared by the convolution kernel intrinsic parameters are added, and then convolution operation is performed. Specifically, the branch operation may be further optimized by adding the shared 3 × 3 convolutional layer and the grouped 1 × 1 convolutional layer (Group 1 × 1conv in fig. 5), and then performing the convolution operation. Thus, the number of floating point calculations can be reduced, and the time required for the calculations can be shortened. Fig. 6 is a schematic diagram of addition of shared convolution and grouped convolution kernels according to an embodiment of the present invention, in fig. 6, the same horizontal direction is the same data channel (different data channels are used in different networks, which is only used for illustration and is not limited in particular), the elongated boxes with different gray levels represent the shared convolution kernels, each elongated box represents that all convolution kernel parameters are shared, that is, the same parameters are used, and the short and thick boxes represent the grouped convolution kernels. By dividing the N channels into one group, only the feature maps between the groups are connected with each other, thereby greatly reducing the amount of calculation. In the grouped convolution kernel, the bold and short boxes of the same gray level only represent the same group and do not represent the use of the same parameters.

Further, in this embodiment, the step S202 further includes: in the even branches of the high-resolution characteristic network model, all channels in the same group are connected with each other during convolution; in the odd branches of the high-resolution feature network model, the convolution is performed without connection among channels in the same group.

Specifically, in this embodiment, in an improvement of the basic block (Basicblock), the basic block (Basicblock) is optimized to be a basic block _ compress (compression block) and a basic block _ compress _ inverse (inverse compression block), where the difference between the basic block _ compress _ inverse and the basic block _ compress _ inverse is that the basic block _ compress _ inverse adopts channel-by-channel convolution, that is, each channel is not connected, so that the amount of computation is reduced, and in a normal convolution process adopted by the basic block _ compress, the channels in the same group are connected with each other. In this embodiment, the HRNet model is globally compressed, that is, a BasicBlock _ compression scheme is used in even branches (e.g., branches 2 and 4) and a BasicBlock _ compression _ inverse scheme is used in odd branches (e.g., branches 1 and 3) of the network. Of course, the above two compression schemes can also be used in combination, and are not limited in particular. Fig. 7 is a diagram of a branch network structure according to an embodiment of the present invention, and each branch in the HRNet model shown in fig. 3 uses the network structure shown in fig. 7 to form a neural network. In fig. 7, the dashed boxes corresponding to the odd branch and the even branch represent network modules corresponding to different compression schemes selected in different odd and even branches.

Specifically, in this embodiment, as shown in fig. 8, the step S203 includes:

step S2031, the 3 × 3 convolutional layer in the bottleneck module is decomposed into a 1 × 1 convolutional layer, a 3 × 3 convolutional layer, and a 1 × 1 convolutional layer, which are connected in sequence.

Step S2032, performing convolution kernel parameter sharing on the 3 × 3 convolutional layer obtained by decomposition.

Step S2033, the first layer of the 1 × 1 convolutional layer and the 3 × 3 convolutional layer are shared between two adjacent bottleneck modules.

The Bottleneck module (bottleeck) is a Bottleneck module commonly used in the HRNet, and is used for scaling and increasing the number of channels of the input image to achieve the purpose of extracting high-level semantic information. In this embodiment, the Bottleneck module (bottleeck) is compressed into a Bottleneck compression module (bottleeck _ compress). The bottleneck module also includes a 3 × 3 convolution, and the compression diagram shown in fig. 5 is an optimization diagram of the 3 × 3 convolution, so that the compression improvement of the bottleneck module is also shown in fig. 5: the 3X 3 convolutional layers were decomposed into 1X 1 convolutional layers-3X 3 convolutional layers-1X 1 convolutional layers. Expanding a characteristic map channel by using 1 × 1 point-by-point convolution, then carrying out convolution kernel parameter sharing on channel-by-channel 3 × 3 convolution layers, and finally sharing a first layer of 1 × 1 convolution layers and a 3 × 3 convolution layer in two adjacent bottleneck modules. Therefore, by sharing the convolution kernel internal parameters, the same prediction effect can be achieved by less calculation amount, the calculation amount is reduced, and the prediction time is shortened. In this embodiment, in fig. 5, Shared 1 × 1conv and Shared 3 × 3conv are respectively a 1 × 1 convolutional layer and a 3 × 3 convolutional layer, and their convolution kernel parameters are Shared, while the convolution kernel parameters of the remaining modules in fig. 5 are not Shared.

Further, in this embodiment, the step S203 further includes: the convolution kernels of the 3 × 3 convolution layers shared by the convolution kernel intrinsic parameters and the grouped 1 × 1 convolution kernels not shared by the convolution kernel intrinsic parameters are added, and then convolution operation is performed. Specifically, the branch operation may be further optimized by adding the shared 3 × 3 convolutional layer and the grouped 1 × 1 convolutional layer (Group 1 × 1conv in fig. 5), and then performing the convolution operation. Thus, the number of floating point calculations can be reduced, and the time required for the calculations can be shortened. The specific flow of the shared convolution and the addition of the grouped convolution kernels is similar to the method adopted when the compression improvement is performed on the basic module, and a specific schematic diagram is also shown in fig. 6, which is not described again here.

In this embodiment, the local compression on the HRNet model includes compression improvements on a base module and a bottleneck module, respectively, and the global compression on the HRNet model includes that different compression schemes are adopted in an even branch and an odd branch of a network overall structure of the HRNet, where the basic block _ compression scheme is adopted in the even branch and the basic block _ compression _ inverse scheme is adopted in the odd branch. The conventional lightweight convolutional network hardly retains the high-resolution characteristics of the image; while the common HRNet can integrate multiple resolution features in feature processing, the floating point calculation amount is large, and effective deployment is difficult under a mobile terminal and an embedded platform. The scheme of the invention is based on methods of packet sharing convolution, deep separable convolution, reverse Bottleneck structure design and the like, redesigns are carried out aiming at two basic modules of BasicBlock (basic module) and Bottleneck (Bottleneck module) in HRNet, and different compression schemes are adopted in even branches and odd branches of the whole structure of the HRNet network. The convolution sharing among the groups can greatly reduce the calculation amount on the premise of achieving the accuracy; the depth separable convolution can enlarge the reception field of convolution, so that each convolution kernel can more effectively extract information, and finally the depth of the network is reduced; the inverse bottleneck architecture design speeds up the processing speed of the whole module by scaling the data channel. The scheme can basically ensure the original prediction accuracy, reduce the parameters and the calculated amount of the network as much as possible and shorten the network reasoning time.

Exemplary device

As shown in fig. 9, an embodiment of the present invention further provides a human key point identification device corresponding to the human key point identification method, where the human key point identification device includes:

an image obtaining module 910, configured to obtain an image of a human body to be detected.

A network model obtaining module 920, configured to obtain a compressed high-resolution feature network model.

The detecting module 930 is configured to detect the human body image to be detected based on the compressed high-resolution feature network model, and obtain position information of a human body key point in the human body image to be detected.

And an output module 940 for outputting the position information of the human body key points.

As can be seen from the above, the human body key point identification device provided in the embodiment of the present invention obtains the human body image to be detected through the image obtaining module 910; acquiring a compressed high-resolution feature network model through a network model acquisition module 920; detecting the human body image to be detected by a detection module 930 based on the compressed high-resolution feature network model to obtain the position information of the human body key points in the human body image to be detected; the position information of the human body key points is output through the output module 940. Compared with the prior art, the human body key point identification device provided by the scheme of the invention is beneficial to reducing the parameter and the calculated amount, shortening the prediction time and transplanting the model to embedded platforms such as a mobile terminal for use.

Specifically, in this embodiment, as shown in fig. 10, the network model obtaining module 920 includes:

and a high-resolution feature network model obtaining unit 921, configured to obtain a high-resolution feature network model.

And a basic module compressing unit 922, configured to compress the basic module of the high-resolution feature network model.

A bottleneck module compressing unit 923, configured to compress the bottleneck module of the high resolution feature network model.

A network model obtaining unit 924, configured to obtain the compressed high-resolution feature network model, and use the obtained high-resolution feature network model as the compressed high-resolution feature network model.

Optionally, the basic module compressing unit 922 is specifically configured to: decomposing the 3 × 3 convolutional layers in the basic module into 1 × 1 convolutional layers, 3 × 3 convolutional layers and 1 × 1 convolutional layers which are connected in sequence; carrying out convolution kernel parameter sharing on the 3 x 3 convolution layer obtained by decomposition; the first layer of the 1 × 1 buildup layer and the 3 × 3 buildup layer are shared between two adjacent base modules.

The basic module (Basicblock) is a basic component unit of HRNet, and generally consists of two 3 × 3 convolutions in cooperation with corresponding pooling networks. Fig. 5 is a schematic compression diagram provided in this embodiment, and as shown in fig. 5, in this embodiment, the basic module is redesigned through a bottleneck structure in the deep neural network, so as to decompose the 3 × 3 convolutional layer into 1 × 1 convolutional layer-3 × 3 convolutional layer-1 × 1 convolutional layer. Expanding a characteristic map channel by using 1 × 1 point-by-point convolution, then carrying out convolution kernel parameter sharing on channel-by-channel 3 × 3 convolution layers, and finally sharing a first layer of 1 × 1 convolution layers and a 3 × 3 convolution layer in two adjacent basic modules. Therefore, by sharing the convolution kernel internal parameters, the same prediction effect can be achieved by less calculation amount, the calculation amount is reduced, and the prediction time is shortened. In this embodiment, in fig. 5, Shared 1 × 1conv and Shared 3 × 3conv are respectively a 1 × 1 convolutional layer and a 3 × 3 convolutional layer, and their convolution kernel parameters are Shared, while the convolution kernel parameters of the remaining modules in fig. 5 are not Shared.

Further, the basic module compressing unit 922 is further configured to: the convolution kernels of the 3 × 3 convolution layers shared by the convolution kernel intrinsic parameters and the grouped 1 × 1 convolution kernels not shared by the convolution kernel intrinsic parameters are added, and then convolution operation is performed. Specifically, the branch operation may be further optimized by adding the shared 3 × 3 convolutional layer and the grouped 1 × 1 convolutional layer (Group 1 × 1conv in fig. 5), and then performing the convolution operation. Thus, the number of floating point calculations can be reduced, and the time required for the calculations can be shortened. Fig. 6 is a schematic diagram of addition of shared convolution and grouped convolution kernels according to an embodiment of the present invention, in fig. 6, the same horizontal direction is the same data channel (different data channels are used in different networks, which is only used for illustration and is not limited in particular), the elongated boxes with different gray levels represent the shared convolution kernels, each elongated box represents that all convolution kernel parameters are shared, that is, the same parameters are used, and the short and thick boxes represent the grouped convolution kernels. By dividing the N channels into one group, only the feature maps between the groups are connected with each other, thereby greatly reducing the amount of calculation. In the grouped convolution kernel, the bold and short boxes of the same gray level only represent the same group and do not represent the use of the same parameters.

Further, the basic module compressing unit 922 is further configured to: in the even branches of the high-resolution characteristic network model, all channels in the same group are connected with each other during convolution; in the odd branches of the high-resolution feature network model, the convolution is performed without connection among channels in the same group.

Specifically, in this embodiment, the bottleneck module compressing unit 923 is specifically configured to: decomposing the 3 × 3 convolutional layer in the bottleneck module into a 1 × 1 convolutional layer, a 3 × 3 convolutional layer and a 1 × 1 convolutional layer which are connected in sequence; carrying out convolution kernel parameter sharing on the 3 x 3 convolution layer obtained by decomposition; the first layer of the 1 × 1 convolutional layer and the 3 × 3 convolutional layer are shared between two adjacent bottleneck modules.

Further, the bottleneck module compressing unit 923 is further configured to: the convolution kernels of the 3 × 3 convolution layers shared by the convolution kernel intrinsic parameters and the grouped 1 × 1 convolution kernels not shared by the convolution kernel intrinsic parameters are added, and then convolution operation is performed. Specifically, the branch operation may be further optimized by adding the shared 3 × 3 convolutional layer and the grouped 1 × 1 convolutional layer (Group 1 × 1conv in fig. 5), and then performing the convolution operation. Thus, the number of floating point calculations can be reduced, and the time required for the calculations can be shortened. The specific flow of the shared convolution and the addition of the grouped convolution kernels is similar to the method adopted when the compression improvement is performed on the basic module, and a specific schematic diagram is also shown in fig. 6, which is not described again here.

Based on the above embodiment, the present invention further provides an intelligent terminal, and a schematic block diagram thereof may be as shown in fig. 11. The intelligent terminal comprises a processor, a memory, a network model interface and a display screen which are connected through a system bus. Wherein, the processor of the intelligent terminal is used for providing calculation and control capability. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a human key point identification program. The internal memory provides an environment for the operation of an operating system and a human body key point identification program in the non-volatile storage medium. The network model interface of the intelligent terminal is used for being connected and communicated with an external terminal through a network model. The human body key point identification program realizes the steps of any one of the human body key point identification methods when being executed by a processor. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen.

It will be understood by those skilled in the art that the block diagram of fig. 11 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation to the intelligent terminal to which the solution of the present invention is applied, and a specific intelligent terminal may include more or less components than those shown in the figure, or combine some components, or have different arrangements of components.

In one embodiment, an intelligent terminal is provided, where the intelligent terminal includes a memory, a processor, and a human key point identification program stored in the memory and executable on the processor, and the human key point identification program performs the following operation instructions when executed by the processor:

acquiring a human body image to be detected;

acquiring a compressed high-resolution characteristic network model;

and outputting the position information of the human body key points.

The embodiment of the invention also provides a computer-readable storage medium, wherein a human body key point identification program is stored on the computer-readable storage medium, and when being executed by a processor, the human body key point identification program realizes the steps of any human body key point identification method provided by the embodiment of the invention.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned functions may be distributed as different functional units and modules according to needs, that is, the internal structure of the apparatus may be divided into different functional units or modules to implement all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art would appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the above modules or units is only one logical division, and the actual implementation may be implemented by another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

The integrated modules/units described above, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium and can implement the steps of the embodiments of the method when the computer program is executed by a processor. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file or some intermediate form. The computer readable medium may include: any entity or device capable of carrying the above-mentioned computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunication signal, software distribution medium, etc. It should be noted that the contents contained in the computer-readable storage medium can be increased or decreased as required by legislation and patent practice in the jurisdiction.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein.

Claims

1. A human body key point identification method is characterized by comprising the following steps:

acquiring a human body image to be detected;

acquiring a compressed high-resolution characteristic network model;

detecting the human body image to be detected based on the compressed high-resolution characteristic network model, and acquiring the position information of human body key points in the human body image to be detected;

and outputting the position information of the human body key points.

2. The human keypoint identification method of claim 1, wherein said obtaining a compressed high resolution feature network model comprises:

acquiring a high-resolution characteristic network model;

compressing a base module of the high-resolution feature network model;

compressing a bottleneck module of the high-resolution feature network model;

3. The human keypoint identification method of claim 2, wherein said compressing the fundamental module of said high resolution feature network model comprises:

decomposing the 3 × 3 convolutional layers in the base module into 1 × 1 convolutional layers, 3 × 3 convolutional layers and 1 × 1 convolutional layers which are connected in sequence;

the first layer of the 1 x 1 convolutional layer and the 3 x 3 convolutional layer is shared between two adjacent base modules.

4. The human keypoint identification method of claim 3, wherein said compressing the fundamental module of said high resolution feature network model further comprises:

adding the convolution kernels of the 3 x 3 convolution layers shared by the convolution kernel intrinsic parameters and the grouped 1 x 1 convolution kernels not shared by the convolution kernel intrinsic parameters, and then performing convolution operation.

5. The human keypoint identification method of claim 4, wherein said compressing the fundamental module of said high resolution feature network model, further comprises:

in even branches of the high-resolution feature network model, all channels in the same group are connected with each other during convolution;

in the odd branches of the high-resolution feature network model, the channels in the same group are not connected during convolution.

6. The human keypoint identification method of claim 2, wherein said compressing the bottleneck module of the high resolution feature network model comprises:

decomposing the 3 × 3 convolutional layers in the bottleneck module into 1 × 1 convolutional layers, 3 × 3 convolutional layers and 1 × 1 convolutional layers which are connected in sequence;

the first layer of the 1 x 1 convolutional layer and the 3 x 3 convolutional layer is shared between two adjacent bottleneck modules.

7. A human keypoint identification device, characterized in that it comprises:

the detection module is used for detecting the human body image to be detected based on the compressed high-resolution characteristic network model and acquiring the position information of human body key points in the human body image to be detected;

8. The apparatus of claim 7, wherein the network model obtaining module comprises:

9. An intelligent terminal, characterized in that the intelligent terminal comprises a memory, a processor and a human body key point identification program stored on the memory and operable on the processor, the human body key point identification program, when executed by the processor, implementing the steps of the human body key point identification method according to any one of claims 1-6.

10. A computer-readable storage medium, having a human key point identification program stored thereon, which when executed by a processor, performs the steps of the human key point identification method according to any one of claims 1-6.