CN112949506A

CN112949506A - Low-cost real-time bone key point identification method and device

Info

Publication number: CN112949506A
Application number: CN202110246577.0A
Authority: CN
Inventors: 程煜钧; 张哲为; 丁博文; 顾友良
Original assignee: Guangzhou Ziweiyun Technology Co ltd
Current assignee: Guangzhou Ziweiyun Technology Co ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2021-06-11

Abstract

The invention discloses a low-cost real-time skeleton key point identification device, which comprises an image acquisition module, a core calculation unit, a lightweight neural network algorithm module, a neural network acceleration engine and a skeleton key point output module, wherein the image acquisition module acquires an image and sends acquired image information to the core calculation unit; the core computing unit is used for processing the acquired image, the lightweight neural network algorithm module adopts an improved SqueezeNet as a basic backbone network, multi-scale feature extraction is carried out on the image by combining a feature pyramid network, the neural network acceleration module is used for accelerating the network, finally, the skeleton key point output module is used for outputting the skeleton key points, the image acquisition module adopts any monocular camera, and the core computing unit adopts a mobile terminal CPU.

Description

Low-cost real-time bone key point identification method and device

Technical Field

The invention relates to the technical field of computer vision, in particular to a low-cost real-time bone key point identification method and device.

Background

The identification technology of the key points of the skeleton is one of the basic technologies of computer vision. The technology detects joints and five sense organs of a human body in image/video data through a sensor (a camera, infrared rays and other equipment), and describes human skeleton information through key points.

However, most of the existing bone key point recognition algorithms based on deep learning are difficult to run in real time on a low-cost hardware platform, and high-cost hardware (such as a GPU or a high-end camera) needs to be matched to achieve real time. The method is based on a series of software and hardware optimization technologies, and can be realized on a low-cost hardware platform to complete real-time identification of the skeletal key points. On the premise of ensuring the accuracy, compared with the identification time delay of about 50 milliseconds introduced by the mainstream method, the identification time delay of 3 milliseconds at the lowest can be achieved by the technology of the invention.

The traditional skeleton key point algorithm is carried out on the basis of geometric prior based on the idea of template matching, and the accuracy is poor. Due to the limitation of hardware performance, the existing bone key point identification algorithm based on deep learning has a low identification speed on a low-cost hardware platform (such as a mobile terminal mobile phone and a tablet), and the linkage application of the algorithm can cause the situations of application blocking, frame loss and the like, so that the user experience is greatly influenced.

Disclosure of Invention

Based on the defects, the invention mainly aims at the identification of the bone key points of the mobile terminal/embedded equipment, adopts a lightweight deep learning algorithm and applies a neural network acceleration technology for optimization, hardware only needs to adopt a CPU and a monocular camera, and can finish low-cost real-time identification of the bone key points without a GPU or a high-end camera (kinect).

The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention discloses a low-cost real-time bone key point identification device which comprises an image acquisition module, a core calculation unit, a lightweight neural network algorithm module, a neural network acceleration engine and a bone key point output module, wherein the image acquisition module acquires an image and sends acquired image information to the core calculation unit; the core computing unit is used for processing the acquired image, the lightweight neural network algorithm module adopts an improved SqueezeNet as a basic backbone network, multi-scale feature extraction is carried out on the image by combining a feature pyramid network, the neural network acceleration module is used for accelerating the network, finally, the skeleton key point output module is used for outputting the skeleton key points, the image acquisition module adopts any monocular camera, and the core computing unit adopts a mobile terminal CPU.

Still further, the neural network acceleration module further comprises: the input image firstly enters a modified Squeezenet backbone network for calculation, wherein the Squeezenet backbone network consists of two convolution layers, eight Fire layers and three pooling layers, wherein the convolution layers conv1 pass through 96 groups of convolution kernels of 7x7, and the convolution layers conv10 pass through 1000 groups of convolution kernels of 1x 1; the sizes of the pooling layers maxpool1, maxpool2 and maxpool3 are all 3x 3; the fire layer structure is unified, wherein the Squeeze part is composed of continuous 1x1 convolutions, and the Expand part is composed of continuous 1x1 convolutions and 3x3 convolution links.

Further, if the number of channels of the Squeeze portion is denoted as S1x1, and the number of convolutions of the expanded portion 1x1 and the number of convolutions of the expanded portion 3x3 are denoted as e1x1 and e3x3, respectively, the relationship between the above three satisfies the following equation: s1 × 1< e1 × 1+ e3 × 3.

Furthermore, the neural network acceleration module adds a network shortcut (short-cut) among the fire3, the fire5, the fire7 and the fire9 layers to improve the robustness during training and prevent overfitting, and the improved Squeezenet backbone network outputs a series of convolution characteristic graphs to a characteristic pyramid network for further calculation.

Furthermore, four scales of fire2, fire6, fire5 and conv10 in the backbone network squeezet are extracted to construct a feature pyramid network, and after the feature pyramid network is processed, a prediction result is output on each scale respectively.

Still further, the feature pyramid network further comprises: the predicted result contains two parts: the heat map (heatmap) and the vector field (paf) are combined to predict results of heatmap and paf in multiple scales, and finally the model obtains output results of the bone key points. If n-scale predictors are obtained, the final predictors Fheatmap and Fpaf of heatmap and paf are expressed as follows:

where vh _ i and vp _ i are the heatmap and paf values output for each scale, respectively.

Still further, the neural network acceleration module further comprises: the method comprises the steps of quantizing and accelerating an engine, wherein the quantizing is that data in an FP32 format are converted into an INT8 format in an inference process for data in an FP32 format adopted in a model training process, and the accelerating engine accelerates in the model inference process based on an OpenCL framework, so that the real-time performance of a bone key point identification process is further improved.

The application also discloses a low-cost real-time bone key point identification method, the device is adopted to collect image data, image processing is carried out on the obtained image information, multi-scale feature extraction is carried out on the image by adopting the improved SqueezeNet as a basic backbone network and combining a feature pyramid network, after the multi-scale feature extraction is carried out on the image by the feature pyramid network, a prediction result is respectively output on each scale, and finally the bone key point information of the collected image is output in real time by quantizing and accelerating the data in the reasoning process.

Compared with the prior art, the invention has the beneficial effects that: the invention mainly aims at the identification of the bone key points of mobile terminal/embedded equipment, adopts a lightweight deep learning algorithm and applies a neural network acceleration technology for optimization, hardware only needs to adopt a CPU and a monocular camera to finish low-cost real-time identification of the bone key points, a GPU or a high-end camera is not needed, an acceleration engine of a quantification and reasoning network is adopted, and on the premise of ensuring accuracy, compared with the identification time delay of about 50 milliseconds introduced by a mainstream method, the identification time delay of 3 milliseconds at least can be achieved by the technology of the invention. The real-time performance of the recognition speed is guaranteed through the technology while the hardware cost is reduced.

Drawings

The invention will be further understood from the following description in conjunction with the accompanying drawings. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the embodiments. In the drawings, like reference numerals designate corresponding parts throughout the different views.

FIG. 1 is a block diagram of the core modules of the real-time bone identification method and apparatus of the present invention;

FIG. 2 is a block diagram of the overall architecture of a lightweight network in one embodiment of the invention;

FIG. 3 is a block diagram of a SqueezeNet backbone network in accordance with an embodiment of the present invention;

FIG. 4 is a block diagram of a fire layer structure in one embodiment of the invention;

fig. 5 is a structural diagram of a feature pyramid network structure in an embodiment of the present invention.

Detailed Description

Example one

The core module of the low-cost real-time bone identification method and device provided by the invention is shown in fig. 1, and comprises an image acquisition module, a core calculation unit, a lightweight neural network algorithm module, a neural network acceleration engine and a bone key point output module. The image acquisition module adopts any monocular camera, and the core computing unit adopts a mobile end CPU. The core design of the invention is a lightweight neural network algorithm module and a neural network acceleration module, and the two modules are adopted to ensure the real-time performance of the system on low-cost hardware.

Wherein, the lightweight neural network algorithm module:

the lightweight neural network algorithm module adopts an improved SqueezeNet as a basic backbone network and combines a Feature Pyramid Network (FPN) to extract multi-scale features so as to improve the precision and speed. The overall structure of the lightweight network is shown in fig. 2.

The input image is first computed in a modified squeezet backbone network consisting of two convolutional layers, eight Fire layers and three pooling layers, as shown in fig. 3. Wherein the convolutional layer conv1 layers pass through 96 sets of convolution kernels of 7x7 (step size of 2), and convolutional layer conv10 passes through 1000 sets of convolution kernels of 1x1 (step size of 1); the sizes of the pooling layers maxpool1, maxpool2 and maxpool3 are all 3x3, and the step length is 2; the fire layer structure is unified, and the structure is shown in the figure four, wherein the Squeeze part is composed of continuous 1x1 convolution, and the expanded part is composed of continuous 1x1 convolution and 3x3 convolution connection. If the number of channels of the Squeeze part is recorded as S_1x1And the number of convolutions of 1x1 and the number of convolutions of 3x3 of Expand part are denoted as e, respectively_1x1And e_3x3Then, the relationship of the above three must satisfy the following formula:

S_1×1<e_1×1+e_3×3

in addition, the improved SqueezeNet backbone network adds a network shortcut (short-cut) among the fire3, fire5, fire7 and fire9 layers to improve robustness during training and prevent overfitting. Finally, the backbone network outputs a series of convolution characteristic graphs to the characteristic pyramid network for the next calculation.

The feature pyramid network structure is shown in fig. five, and four scales of fire2, fire6, fire5 and conv10 in the backbone network squeezet are extracted in the method to construct the feature pyramid network. And after the characteristic pyramid network processing, respectively outputting a predicted result on each scale. The predicted result contains two parts: heat map (heatmap) and vector field (paf). And synthesizing the prediction results of heatmap and paf in multiple scales, and finally obtaining the output result of the skeleton key point by the model. If n-scale predictors are in the end, the final predictors F of heatmap and paf_heatmapAnd F_pafRespectively expressed by the following formulas:

Wherein, the neural network acceleration module:

in the identification method, a neural network acceleration module is adopted to accelerate the network. The module is divided into two parts: quantization and use of an acceleration engine. The quantification means that data in the FP32 format is adopted in the model training process, and the data in the FP32 format is converted into the INT8 format in the reasoning process, so that the reasoning speed of the model is greatly increased on the premise of basically ensuring the model accuracy, and better real-time performance is obtained. The adoption of the acceleration engine means that the model reasoning process of the method is accelerated based on an OpenCL framework, so that the real-time performance of the bone key point identification process is further improved.

Example two

The embodiment provides a low-cost real-time bone key point identification device, which comprises an image acquisition module, a core calculation unit, a lightweight neural network algorithm module, a neural network acceleration engine and a bone key point output module, wherein the image acquisition module acquires an image and sends acquired image information to the core calculation unit; the core computing unit is used for processing the acquired image, the lightweight neural network algorithm module adopts an improved SqueezeNet as a basic backbone network, multi-scale feature extraction is carried out on the image by combining a feature pyramid network, the neural network acceleration module is used for accelerating the network, finally, the skeleton key point output module is used for outputting the skeleton key points, the image acquisition module adopts any monocular camera, and the core computing unit adopts a mobile terminal CPU.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Although the invention has been described above with reference to various embodiments, it should be understood that many changes and modifications may be made without departing from the scope of the invention. It is therefore intended that the foregoing detailed description be regarded as illustrative rather than limiting, and that it be understood that it is the following claims, including all equivalents, that are intended to define the spirit and scope of this invention. The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims

1. A low-cost real-time bone key point identification device is characterized by comprising an image acquisition module, a core calculation unit, a lightweight neural network algorithm module, a neural network acceleration engine and a bone key point output module, wherein the image acquisition module acquires an image and sends acquired image information to the core calculation unit; the core computing unit is used for processing the acquired image, the lightweight neural network algorithm module adopts an improved SqueezeNet as a basic backbone network, multi-scale feature extraction is carried out on the image by combining a feature pyramid network, the neural network acceleration module is used for accelerating the network, finally, the skeleton key point output module is used for outputting the skeleton key points, the image acquisition module adopts any monocular camera, and the core computing unit adopts a mobile terminal CPU.

2. The low-cost real-time bone keypoint identification device of claim 1, wherein said neural network acceleration module further comprises: the input image firstly enters a modified Squeezenet backbone network for calculation, wherein the Squeezenet backbone network consists of two convolution layers, eight Fire layers and three pooling layers, wherein the convolution layers conv1 pass through 96 groups of convolution kernels of 7x7, and the convolution layers conv10 pass through 1000 groups of convolution kernels of 1x 1; the sizes of the pooling layers maxpool1, maxpool2 and maxpool3 are all 3x 3; the fire layer structure is unified, wherein the Squeeze part is composed of continuous 1x1 convolutions, and the Expand part is composed of continuous 1x1 convolutions and 3x3 convolution links.

3. The low-cost real-time bone key point identification device of claim 2, wherein if the number of channels of the Squeeze part is denoted as S1x1 and the number of convolutions of the expanded part 1x1 and the number of convolutions of the expanded part 3x3 are denoted as e1x1 and e3x3, respectively, the relationship of the above three is satisfied by the following formula: s1 × 1< e1 × 1+ e3 × 3.

4. The device of claim 3, wherein the neural network accelerator module further adds a short-cut between each of the fire3, fire5, fire7 and fire9 layers to improve robustness during training and prevent overfitting, and the improved Squeezenet backbone network outputs a series of convolution signatures to the signature pyramid network for further computation.

5. The device as claimed in claim 3, wherein four dimensions of fire2, fire6, fire5 and conv10 in the backbone network squeezet are extracted to construct a feature pyramid network, and after the feature pyramid network processing, the predicted results are output at each dimension respectively.

6. The low-cost, real-time bone keypoint identification device of claim 4, wherein said feature pyramid network further comprises: the predicted result contains two parts: the heat map (heatmap) and the vector field (paf) are combined to predict results of heatmap and paf in multiple scales, and finally the model obtains output results of the bone key points. If n-scale predictors are obtained, the final predictors Fheatmap and Fpaf of heatmap and paf are expressed as follows:

7. The low-cost real-time bone keypoint identification device of claim 1, wherein said neural network acceleration module further comprises: the method comprises the steps of quantizing and accelerating an engine, wherein the quantizing is that data in an FP32 format are converted into an INT8 format in an inference process for data in an FP32 format adopted in a model training process, and the accelerating engine accelerates in the model inference process based on an OpenCL framework, so that the real-time performance of a bone key point identification process is further improved.

8. A low-cost real-time bone key point identification method, which utilizes the device as claimed in claims 1-7, characterized in that image data is collected, the obtained image information is processed, the image is extracted by using improved SqueezeNet as basic backbone network and combining with characteristic pyramid network, after processing by characteristic pyramid network, the predicted result is output on each scale, and finally the bone key point information of the collected image is output in real time by quantizing and accelerating the data in the reasoning process.