CN114519868A - Real-time bone key point identification method and system based on coordinate system regression - Google Patents

Real-time bone key point identification method and system based on coordinate system regression Download PDF

Info

Publication number
CN114519868A
CN114519868A CN202210160965.1A CN202210160965A CN114519868A CN 114519868 A CN114519868 A CN 114519868A CN 202210160965 A CN202210160965 A CN 202210160965A CN 114519868 A CN114519868 A CN 114519868A
Authority
CN
China
Prior art keywords
convolution
layer
backbone network
matrix
coordinate system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210160965.1A
Other languages
Chinese (zh)
Inventor
顾友良
张磊
赵乾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Xinwangpai Intelligent Information Technology Co ltd
Original Assignee
Guangdong Xinwangpai Intelligent Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Xinwangpai Intelligent Information Technology Co ltd filed Critical Guangdong Xinwangpai Intelligent Information Technology Co ltd
Priority to CN202210160965.1A priority Critical patent/CN114519868A/en
Publication of CN114519868A publication Critical patent/CN114519868A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a real-time bone key point identification system based on coordinate system regression, which comprises an image acquisition module, a core calculation unit, a lightweight neural network algorithm module and a coordinate system regression output module, wherein the lightweight neural network algorithm module adopts ShuffleNet V2 as a basic backbone network, two continuous upsampling is added to the last layer of the ShuffleNet V2 backbone network, and ShuffleV2Block3 and DUC2 in the network are subjected to skip connection (heatmap), and finally a heatmap is obtained; the coordinate system regression output module defines the heat map output by each channel obtained by the lightweight neural network algorithm module as Z, normalizes the value to be between 0 and 1 through a normalization function, and defines the normalized heat map as
Figure DDA0003514650890000011
Obtaining a discrete probability distribution value which is expressed as a matrix of m multiplied by n, wherein m and n correspond to the resolution of the heat map, and the coordinate information existing in Z is obtained through calculation of a defined formula.

Description

Real-time bone key point identification method and system based on coordinate system regression
Technical Field
The invention relates to the technical field of image recognition, in particular to a real-time bone key point recognition method and system based on coordinate system regression.
Background
The identification technology of the key points of the skeleton is one of the basic technologies of computer vision. The technology detects joints and five sense organs of a human body in image/video data through a sensor (a camera, infrared rays and other equipment), and describes human skeleton information through key points. The existing deep learning-based new algorithm for identifying the skeletal key points is mostly based on a Gaussian heat map output mode, and has the problems of large required output characteristic diagram and low algorithm training and reasoning speed. The real-time operation is difficult on a low-cost hardware platform, and the real-time operation can be achieved only by matching high-cost hardware (such as a GPU or a high-end camera). The output of the latest skeletal key point identification algorithm based on deep learning is basically a Gaussian heat map, and the value output by the heat map is an integer, is different from a coordinate regression output which is a floating point number, and cannot lose precision, so that the problem of a lower bound of theoretical error exists.
Based on the defects, the invention is mainly oriented to the identification of the bone key points of the mobile terminal/embedded equipment, adopts a lightweight deep learning algorithm and a coordinate system regression to avoid the lower bound problem of theoretical errors of heat map output, hardware only needs to adopt a CPU and a monocular camera to complete the low-cost real-time identification of the bone key points, and a GPU or a high-end camera (such as kinect) is not needed. The traditional skeleton key point algorithm is carried out on the basis of geometric prior based on the idea of template matching, and the accuracy is poor. Due to the limitation of hardware performance, the existing bone key point identification algorithm based on deep learning has a low identification speed on a low-cost hardware platform (such as a mobile terminal mobile phone and a tablet), and the linkage application of the algorithm can cause the situations of application blocking, frame loss and the like, so that the user experience is greatly influenced.
The method can realize the real-time identification of the key points of the skeleton on a low-cost hardware platform.
Disclosure of Invention
Aiming at the technical problems, particularly in the traditional bone key point identification, the invention can realize the real-time identification of the bone key points on a low-cost hardware platform.
The present invention is directed to solving at least the problems of the prior art. Therefore, the invention discloses a real-time bone key point identification system based on coordinate system regression, which comprises an image acquisition module, a core calculation unit, a lightweight neural network algorithm module and a coordinate system regression output module. The image acquisition module adopts any monocular camera, the core calculation unit adopts a mobile end CPU, the lightweight neural network algorithm module adopts ShuffleNet V2 as a basic backbone network, two times of continuous up-sampling are added to the last layer of the ShuffleNet V2 backbone network, and the ShuffleV2Block3 and the DUC2 in the network are subjected to skip connection (heatmap), and finally a heatmap is obtained;
the coordinate system regression output module defines the heat map output by each channel obtained by the lightweight neural network algorithm module as Z, normalizes the value to be between 0 and 1 through a normalized function, and defines the normalized heat map as Z
Figure BDA0003514650870000021
A discrete probability distribution value is obtained, represented as a matrix of m x n, where m and n correspond to the resolution of the heat map.
Furthermore, the lightweight neural network algorithm module using ShuffleNetV2 as a basic backbone network further includes: an input image firstly enters a ShuffleNet V2 backbone network for calculation, wherein the ShuffleNet V2 backbone network consists of two convolution layers, three ShuffleV2Block layers and a maximum pooling layer, wherein the convolution layer conv1 layer passes through 24 groups of convolution kernels (the step length is 2) of 3x3, and the convolution layer conv5 passes through 1024 groups of convolution kernels (the step length is 1) of 1x 1; the size of the pooling layer Maxpool1 is 3x3, and the step length is 2; the structure of the ShuffleV2Block layer is unified, the characteristic diagram of the input channel is divided into two branches, the left branch does not carry out any operation, the right branch consists of continuous 1x1 convolution kernels and 3x3 convolution connection, the two branches are merged by concat operation, and channel shuffle (channel shuffle) is carried out immediately.
Still further, the adding two consecutive upsamplings to the last layer of the shefflenetv 2 backbone network further comprises: outputting a series of convolution characteristic graphs to the backbone network and performing continuous DUC upsampling on the convolution characteristic graphs through a DUC, wherein the DUC layer structure is unified and is formed by connecting continuous 3x3 convolution and a PixelShuffle upsampling mode, obtaining a high-resolution characteristic graph from a low-resolution characteristic graph through convolution and multi-channel recombination, and performing jump connection on the characteristic graph ShuffleBlock3 corresponding to the same shape of the ShuffleNetV2 backbone network on the last upsampling layer DUC2 to improve the robustness during training, prevent overfitting and finally output a heat graph.
Further, the normalized function is defined as the following formula 1:
Figure BDA0003514650870000022
first, two matrices X and Y are defined, where i is 1 … m and j is 1 … n, and each entry includes one of the matrices
Figure BDA0003514650870000023
X-axis coordinates and y-axis coordinates of (a).
Furthermore, the matrix X and the matrix Y are defined as the following formula 2 and formula 3
Figure BDA0003514650870000031
Figure BDA0003514650870000032
Wherein, by pair
Figure BDA0003514650870000033
Make a probabilistic interpretation because
Figure BDA0003514650870000034
Is 0 to 1 and the sum is 1, then the condition of the probability distribution is satisfied, thus the pair
Figure BDA0003514650870000035
Performing matrix inner product calculation with X to obtain expected value on the matrix X and obtain transverse coordinate value on the matrix X, and obtaining longitudinal coordinate value on the matrix Y by obtaining expected value on the matrix Y and corresponding to the intersection of the transverse coordinate value and the longitudinal coordinate value
Figure BDA0003514650870000036
Thus obtaining information of the coordinate points, and defining a function P for obtaining the information of the coordinate points as the following formula 4, wherein<.,.>FRepresenting the matrix inner product calculation:
Figure BDA0003514650870000037
the invention also discloses a real-time bone key point identification method based on coordinate system regression, which comprises the following steps:
step 1: an input image firstly enters a ShuffleNet V2 backbone network for calculation, wherein the ShuffleNet V2 backbone network consists of two convolution layers, three ShuffleV2Block layers and a maximum pooling layer, wherein the convolution layer conv1 layer passes through 24 groups of convolution kernels (the step size is 2) of 3x3, and the convolution layer conv5 passes through 1024 groups of convolution kernels (the step size is 1) of 1x 1; the size of the pooling layer Maxpool1 is 3x3, and the step length is 2; the structure of a shuffle V2Block layer is unified, a feature map of an input channel is divided into two branches, the left branch is not operated, the right branch is formed by continuous convolution kernel of 1x1 and convolution connection of 3x3, the two branches are merged by concat operation and then channel shuffle (channel shuffle) is carried out, a series of convolution feature maps are output to the backbone network and are sampled by continuous DUCs, wherein the DUC layer structure is unified and is formed by continuous convolution of 3x3 and connection of PixelShuffle upsampling modes, a low-resolution feature map is subjected to convolution and recombination of multiple channels to obtain a high-resolution feature map, and a feature map 3 with the same shape as that of a ShuffNetleV 2 backbone network is subjected to jump connection on a last upsampling layer DUC2 to improve the robustness during the training, prevent over heat map and finally output the heat map;
And 2, step: for the heat map obtained by the lightweight neural network algorithm module for each channel output, defined as Z, normalizing the values to between 0 and 1 by normalized function, and defining the normalized heat map as
Figure BDA0003514650870000041
Obtaining a discrete probability distribution value which is expressed as a matrix of m multiplied by n, wherein m and n correspond to the resolution of the heat map, and calculating the coordinate information existing in Z through a defined formula.
The invention further discloses a device comprising: at least one processor and memory; the memory stores computer-executable instructions; the at least one processor executes computer-executable instructions stored by the memory, causing the at least one processor to perform the identification method as described above.
The invention further discloses a computer readable storage medium, wherein computer execution instructions are stored in the computer readable storage medium, and when a processor executes the computer execution instructions, the identification method is realized.
Compared with the prior art, the invention has the beneficial effects that: the traditional skeleton key point algorithm is carried out on the basis of geometric prior based on the idea of template matching, and the accuracy is poor. Due to the limitation of hardware performance, the existing bone key point identification algorithm based on deep learning has a low identification speed on a low-cost hardware platform (such as a mobile terminal mobile phone and a tablet), and the linkage application of the algorithm can cause the situations of application blocking, frame loss and the like, so that the user experience is greatly influenced. The output of the latest skeletal key point identification algorithm based on deep learning is basically a Gaussian heat map, and the value output by the heat map is an integer, is different from a coordinate regression output which is a floating point number, and cannot lose precision, so that the problem of a lower bound of theoretical error exists. Based on the defects, the invention is mainly oriented to the identification of the bone key points of the mobile terminal/embedded equipment, adopts a lightweight deep learning algorithm and a coordinate system regression to avoid the lower bound problem of theoretical errors of heat map output, hardware only needs to adopt a CPU and a monocular camera to complete the low-cost real-time identification of the bone key points, and a GPU or a high-end camera (such as kinect) is not needed.
Drawings
The invention will be further understood from the following description in conjunction with the accompanying drawings. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the embodiments. In the drawings, like reference numerals designate corresponding parts throughout the different views.
FIG. 1 is a core block diagram of the present invention for a method for real-time bone key point identification based on coordinate system regression;
FIG. 2 is an overall diagram of a lightweight network according to an embodiment of the invention;
fig. 3 is a network structure diagram of a backbone network according to an embodiment of the present invention;
fig. 4 is a network structure diagram of a backbone network according to an embodiment of the present invention.
Detailed Description
Example one
The core module of the real-time bone key point identification method based on coordinate system regression is shown in fig. 1, and comprises an image acquisition module, a core calculation unit, a lightweight neural network algorithm module and a coordinate system regression output module. The image acquisition module adopts any monocular camera, and the core computing unit adopts a mobile end CPU. The core design of the invention is a lightweight neural network algorithm module and a coordinate system regression output module, and the two modules are adopted to ensure the real-time performance of the system on low-cost hardware.
Firstly, a lightweight neural network algorithm module:
the lightweight neural network algorithm module adopts ShuffleNet V2 as a basic backbone network, adds two times of continuous up-sampling on the last layer of the ShuffleNet V2 backbone network, performs skip connection (skip connection) on ShuffleV2Block3 and DUC2 in the network, and finally obtains a heatmap (heatmap). The overall result of the lightweight network is shown in fig. 2.
The input image firstly enters a ShuffleNet V2 backbone network for calculation, and the ShuffleNet V2 backbone network consists of two convolutional layers, three ShuffleV2Block layers and a maximum pooling layer. Wherein, the convolutional layer conv1 layer passes through 24 groups of convolution kernels of 3x3 (step size is 2), and the convolutional layer conv5 passes through 1024 groups of convolution kernels of 1x1 (step size is 1); the size of the pooling layer Maxpool1 is 3x3, and the step length is 2; the structure of the shuffle 2Block layers is uniform, as shown in fig. 3 and fig. 4, as shown in fig. 3, the feature map of the input channel is divided into two branches, the left branch does not perform any operation, the right branch is formed by connecting consecutive 1x1 convolution kernels and 3x3 convolution, the two branches are merged by concat operation, and channel shuffle (channel shuffle) is performed next. As shown in fig. 4, roughly consistent with the structure of fig. 3, the branch on the left consists of a succession of 3x3 convolution kernels and 1x1 convolution concatenations.
A series of convolution signatures are output to the backbone network through successive DUC upsampling. The structure of the DUC layer is unified, the DUC layer is formed by connecting continuous 3x3 convolution and a PixelShuffle up-sampling mode, and a feature map with low resolution is obtained by convolution and recombination among multiple channels. And a characteristic diagram ShuffleBlock3 corresponding to the same shape of the ShuffleNetV2 backbone network is subjected to jump connection on the last upper sampling layer DUC2 so as to improve the robustness during training, prevent overfitting and finally output a heatmap.
II, a coordinate system regression output module:
the heatmap output by each channel obtained by the lightweight neural network algorithm module is defined as Z. The values were normalized to between 0 and 1 by normalized (normalization function), and the normalized heat map was defined as
Figure BDA0003514650870000051
A discrete probability distribution value is obtained, represented as a matrix of m x n, where m and n correspond to the resolution of the heat map.
The normalization function is defined as the following equation 1
Figure BDA0003514650870000061
First, two matrices X and Y are defined, i being 1 … m and j being 1 … n, so that each entry thereof includes
Figure BDA0003514650870000062
X-axis coordinates and y-axis coordinates.
The matrix X and the matrix Y are defined as the following formula 2 and formula 3
Figure BDA0003514650870000063
Figure BDA0003514650870000064
By pairs
Figure BDA0003514650870000065
Make a probabilistic interpretation, because
Figure BDA0003514650870000066
Is 0 to 1 and the sum is 1, then the condition of the probability distribution is satisfied. Thus is to for
Figure BDA0003514650870000067
Performing matrix inner product calculation with X to obtain expected value on the matrix X and obtain transverse coordinate value on the matrix X, and obtaining longitudinal coordinate value on the matrix Y by obtaining expected value on the matrix Y and corresponding to the intersection of the transverse coordinate value and the longitudinal coordinate value
Figure BDA0003514650870000068
And the position information of the x-axis coordinate and the y-axis coordinate, thereby obtaining the information of the coordinate point. The function P for obtaining coordinate point information is defined as the following formula 4, wherein<.,.>FRepresenting the matrix inner product calculation.
Figure BDA0003514650870000069
Let m-n-6, as an example:
Figure BDA00035146508700000610
Figure BDA0003514650870000071
Figure BDA0003514650870000072
by pairs
Figure BDA0003514650870000073
Performing matrix inner product calculation with X to obtain the expected value of-0.166 on the matrix X, and the same way
Figure BDA0003514650870000074
The matrix inner product calculation was performed with Y to obtain an expected value of-0.166 on the matrix Y. The intersection of two values corresponds to
Figure BDA0003514650870000075
X-axis coordinate and y-axis coordinate position information. If the normalized Gaussian heatmap has only one peak value, the transformation method can be used for directly obtaining the information of the coordinate point.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Although the invention has been described above with reference to various embodiments, it should be understood that many changes and modifications may be made without departing from the scope of the invention. It is therefore intended that the foregoing detailed description be regarded as illustrative rather than limiting, and that it be understood that it is the following claims, including all equivalents, that are intended to define the spirit and scope of this invention. The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure in any way whatsoever. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims (8)

1. A real-time bone key point identification system based on coordinate system regression comprises an image acquisition module, a core calculation unit, a lightweight neural network algorithm module and a coordinate system regression output module, wherein the image acquisition module adopts any monocular camera, and the core calculation unit adopts a mobile end CPU (Central processing Unit), and is characterized in that the lightweight neural network algorithm module adopts ShuffleNet V2 as a basic backbone network, adds two continuous up-sampling to the last layer of the ShuffleNet V2 backbone network, and performs skip connection (skip connection) on ShuffleV2Block3 and DUC2 in the network, and finally obtains a heatmap (heatmap);
the coordinate system regression output module defines the heat map output by each channel obtained by the lightweight neural network algorithm module as Z, normalizes the value to be between 0 and 1 through a normalization function, and defines the normalized heat map as
Figure FDA0003514650860000011
Obtaining a discrete probability distribution value which is expressed as a matrix of m multiplied by n, wherein m and n correspond to the resolution of the heat map, and the coordinate information of the skeleton key point in Z is obtained through calculation of a defined formula.
2. The coordinate system regression-based real-time bone keypoint identification system of claim 1, wherein said lightweight neural network algorithm module employing ShuffleNetV2 as a basic backbone network further comprises: an input image firstly enters a ShuffleNet V2 backbone network for calculation, wherein the ShuffleNet V2 backbone network consists of two convolution layers, three ShuffleV2Block layers and a maximum pooling layer, wherein the convolution layer conv1 layer passes through 24 groups of convolution kernels (the step size is 2) of 3x3, and the convolution layer conv5 passes through 1024 groups of convolution kernels (the step size is 1) of 1x 1; the size of the pooling layer Maxpool1 is 3x3, and the step length is 2; the structure of the ShuffleV2Block layer is unified, the characteristic diagram of the input channel is divided into two branches, the left branch does not carry out any operation, the right branch consists of continuous 1x1 convolution kernels and 3x3 convolution connection, the two branches are merged by concat operation, and channel shuffle (channel shuffle) is carried out immediately.
3. The coordinate system regression-based real-time bone keypoint identification system of claim 2, wherein said adding two consecutive upsamplings to the last layer of the ShuffleNet V2 backbone network further comprises: outputting a series of convolution characteristic graphs to the backbone network and performing continuous DUC upsampling on the convolution characteristic graphs through a DUC, wherein the DUC layer structure is unified and is formed by connecting continuous 3x3 convolution and a PixelShuffle upsampling mode, obtaining a high-resolution characteristic graph from a low-resolution characteristic graph through convolution and multi-channel recombination, and performing jump connection on the characteristic graph ShuffleBlock3 corresponding to the same shape of the ShuffleNetV2 backbone network on the last upsampling layer DUC2 to improve the robustness during training, prevent overfitting and finally output a heat graph.
4. The coordinate system regression-based real-time bone keypoint identification system of claim 3, wherein said normalized function is defined as formula 1:
Figure FDA0003514650860000021
first, two matrices X and Y are defined, where i is 1 … m and j is 1 … n, and each entry includes one of the matrices
Figure FDA0003514650860000029
X-axis coordinates and y-axis coordinates.
5. The coordinate system regression-based real-time bone keypoint identification system of claim 4, wherein said matrix X and matrix Y are defined as follows equation 2 and equation 3:
Figure FDA0003514650860000022
Figure FDA0003514650860000023
Wherein, by pair
Figure FDA0003514650860000024
Make a probabilistic interpretation because
Figure FDA0003514650860000025
Is 0 to 1 and the sum is 1, then the condition of the probability distribution is satisfied, thus
Figure FDA0003514650860000026
Performing matrix inner product calculation with X to obtain expected value on the matrix X and obtain transverse coordinate value on the matrix X, and similarly obtaining expected value on the matrix Y and obtain longitudinal coordinate value on the matrix Y, transverse coordinate value and longitudinal coordinate valueIntersection correspondences of scalar values
Figure FDA0003514650860000027
Thus obtaining information of the coordinate points, and defining a function P for obtaining the information of the coordinate points as the following formula 4, wherein<.,.>FRepresenting the matrix inner product calculation:
Figure FDA0003514650860000028
6. a real-time bone key point identification method based on coordinate system regression is characterized by comprising the following steps:
step 1: an input image firstly enters a ShuffleNet V2 backbone network for calculation, wherein the ShuffleNet V2 backbone network consists of two convolution layers, three ShuffleV2Block layers and a maximum pooling layer, wherein the convolution layer conv1 layer passes through 24 groups of convolution kernels (the step size is 2) of 3x3, and the convolution layer conv5 passes through 1024 groups of convolution kernels (the step size is 1) of 1x 1; the size of the pooling layer Maxpool1 is 3x3, and the step length is 2; the structure of a shuffle V2Block layer is unified, a feature map of an input channel is divided into two branches, the left branch is not operated, the right branch is formed by continuous convolution kernel of 1x1 and convolution connection of 3x3, the two branches are merged by concat operation and then channel shuffle (channel shuffle) is carried out, a series of convolution feature maps are output to the backbone network and are sampled by continuous DUCs, wherein the DUC layer structure is unified and is formed by continuous convolution of 3x3 and connection of PixelShuffle upsampling modes, a low-resolution feature map is subjected to convolution and recombination of multiple channels to obtain a high-resolution feature map, and a feature map 3 with the same shape as that of a ShuffNetleV 2 backbone network is subjected to jump connection on a last upsampling layer DUC2 to improve the robustness during the training, prevent over heat map and finally output the heat map;
And 2, step: for the heat map obtained by the lightweight neural network algorithm module and output by each channel, the heat map is defined as Z, and numerical values are classified by normalized functionNormalizing to between 0 and 1 and defining the normalized heatmap as
Figure FDA0003514650860000031
And obtaining a discrete probability distribution value which is expressed as a matrix of m multiplied by n, wherein m and n correspond to the resolution of the heat map, and the coordinate information of the bone key point in Z is obtained through the calculation of a defined formula.
7. An apparatus, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method of claim 6.
8. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, implement the method of claim 6.
CN202210160965.1A 2022-02-22 2022-02-22 Real-time bone key point identification method and system based on coordinate system regression Pending CN114519868A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210160965.1A CN114519868A (en) 2022-02-22 2022-02-22 Real-time bone key point identification method and system based on coordinate system regression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210160965.1A CN114519868A (en) 2022-02-22 2022-02-22 Real-time bone key point identification method and system based on coordinate system regression

Publications (1)

Publication Number Publication Date
CN114519868A true CN114519868A (en) 2022-05-20

Family

ID=81598452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210160965.1A Pending CN114519868A (en) 2022-02-22 2022-02-22 Real-time bone key point identification method and system based on coordinate system regression

Country Status (1)

Country Link
CN (1) CN114519868A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115171149A (en) * 2022-06-09 2022-10-11 广州紫为云科技有限公司 Monocular RGB image regression-based real-time human body 2D/3D bone key point identification method
CN115719518A (en) * 2023-01-10 2023-02-28 浙江壹体科技有限公司 Behavior recognition method, system, equipment and medium based on embedded platform
CN115953839A (en) * 2022-12-26 2023-04-11 广州紫为云科技有限公司 Real-time 2D gesture estimation method based on loop architecture and coordinate system regression

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115171149A (en) * 2022-06-09 2022-10-11 广州紫为云科技有限公司 Monocular RGB image regression-based real-time human body 2D/3D bone key point identification method
CN115171149B (en) * 2022-06-09 2023-12-05 广州紫为云科技有限公司 Real-time human body 2D/3D skeleton key point identification method based on monocular RGB image regression
CN115953839A (en) * 2022-12-26 2023-04-11 广州紫为云科技有限公司 Real-time 2D gesture estimation method based on loop architecture and coordinate system regression
CN115953839B (en) * 2022-12-26 2024-04-12 广州紫为云科技有限公司 Real-time 2D gesture estimation method based on loop architecture and key point regression
CN115719518A (en) * 2023-01-10 2023-02-28 浙江壹体科技有限公司 Behavior recognition method, system, equipment and medium based on embedded platform

Similar Documents

Publication Publication Date Title
CN114519868A (en) Real-time bone key point identification method and system based on coordinate system regression
CN110188768B (en) Real-time image semantic segmentation method and system
CN113902926A (en) General image target detection method and device based on self-attention mechanism
CN112016543A (en) Text recognition network, neural network training method and related equipment
CN110929080B (en) Optical remote sensing image retrieval method based on attention and generation countermeasure network
CN111582041B (en) Brain electricity identification method based on CWT and MLMSFFCNN
CN115171149B (en) Real-time human body 2D/3D skeleton key point identification method based on monocular RGB image regression
CN112990228B (en) Image feature matching method, related device, equipment and storage medium
CN110569738A (en) natural scene text detection method, equipment and medium based on dense connection network
CN112949506A (en) Low-cost real-time bone key point identification method and device
US20230334893A1 (en) Method for optimizing human body posture recognition model, device and computer-readable storage medium
CN113869282A (en) Face recognition method, hyper-resolution model training method and related equipment
US20200005078A1 (en) Content aware forensic detection of image manipulations
CN111611925A (en) Building detection and identification method and device
CN113298032A (en) Unmanned aerial vehicle visual angle image vehicle target detection method based on deep learning
CN111598087A (en) Irregular character recognition method and device, computer equipment and storage medium
CN111340213B (en) Neural network training method, electronic device, and storage medium
CN114091648A (en) Image classification method and device based on convolutional neural network and convolutional neural network
CN116503399A (en) Insulator pollution flashover detection method based on YOLO-AFPS
CN116110102A (en) Face key point detection method and system based on auxiliary thermodynamic diagram
Trevino-Sanchez et al. Hybrid pooling with wavelets for convolutional neural networks
CN110717405A (en) Face feature point positioning method, device, medium and electronic equipment
CN113239693B (en) Training method, device, equipment and storage medium of intention recognition model
CN115080699A (en) Cross-modal retrieval method based on modal specific adaptive scaling and attention network
CN114997365A (en) Knowledge distillation method and device for image data, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination