CN111291593A - Method for detecting human body posture - Google Patents

Method for detecting human body posture Download PDF

Info

Publication number
CN111291593A
CN111291593A CN201811492525.6A CN201811492525A CN111291593A CN 111291593 A CN111291593 A CN 111291593A CN 201811492525 A CN201811492525 A CN 201811492525A CN 111291593 A CN111291593 A CN 111291593A
Authority
CN
China
Prior art keywords
human body
neural network
layer
image
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811492525.6A
Other languages
Chinese (zh)
Other versions
CN111291593B (en
Inventor
黄超
徐滢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Pinguo Technology Co Ltd
Original Assignee
Chengdu Pinguo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Pinguo Technology Co Ltd filed Critical Chengdu Pinguo Technology Co Ltd
Priority to CN201811492525.6A priority Critical patent/CN111291593B/en
Publication of CN111291593A publication Critical patent/CN111291593A/en
Application granted granted Critical
Publication of CN111291593B publication Critical patent/CN111291593B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for detecting human body posture, which comprises the following steps: inputting the preprocessed human body image to be detected into a pre-trained neural network model, and acquiring a predetermined number of hot spot graphs, wherein each hot spot graph comprises a human body joint point; the neural network model comprises a first 14 layers of a MobileNetV2 network, a dimension transformation layer, a first up-sampling layer, a first convolution neural network layer, a BN regularization layer, a ReLU activation function layer, a second up-sampling layer and a second convolution neural network layer which are connected in sequence; the convolution operations in the neural network model all adopt separable convolution operations; acquiring a preset number of human body joint point coordinates from a preset number of heat point maps; and zooming the coordinates of each human body joint point to an image coordinate system of the human body image to be detected, and acquiring the human body posture joint points of the human body image to be detected. The technical scheme provided by the invention can be used for detecting the human body posture in real time on the terminal with smaller memory and limited operation capacity of the CPU and the GPU.

Description

Method for detecting human body posture
Technical Field
The invention relates to the technical field of deep learning, in particular to a method for detecting human body postures.
Background
The detection of the human body posture can be applied to various fields at present, for example, the detection of the human body posture can be applied to the field of security protection and can be used for identifying the behavior of the human body; the method is applied to the field of game entertainment, and can increase the interest of games. And the detection of the human posture is finally concluded as the detection of the human posture joint point.
At present, there are two main methods for detecting human body posture joint points: one is a direct regression joint point method, namely, a network model is used for directly obtaining human posture joint points; the other method is a regression hotspot graph prediction method, namely, a network model is used for obtaining a plurality of hotspot graphs, and then the hotspot graphs are processed to obtain the final coordinates of the human body joint points, wherein one hotspot graph corresponds to one joint point. The direct regression joint point method usually has a poor effect of directly regressing the coordinates of the joint points due to large changes of the posture, the wear and the portrait background of the human body, and is also very difficult to train a network model, so that a good usable model is difficult to obtain through convergence. Although the regression hot spot diagram prediction is better than the direct regression joint point in effect, due to the complex network structure and the huge network model, the training of the network model is also difficult, and the regression hot spot diagram prediction cannot be applied to terminals with small memories and limited CPU or GPU computing capacity, so that the application and popularization of human posture detection are greatly limited.
Disclosure of Invention
The invention aims to provide a method for detecting human body gestures, which can detect the human body gestures in real time on a terminal with small memory and limited CPU or GPU computing capacity.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a method of detecting human gestures, comprising: inputting the preprocessed human body image to be detected into a pre-trained neural network model, and acquiring a predetermined number of hot spot maps, wherein each hot spot map comprises a human body joint point; the neural network model comprises a first 14 layers of a MobileNetV2 network, a dimension transformation layer, a first up-sampling layer, a first convolution neural network layer, a BN regularization layer, a ReLU activation function layer, a second up-sampling layer and a second convolution neural network layer which are connected in sequence; the convolution operations in the neural network model all adopt separable convolution operations; acquiring a preset number of human body joint point coordinates from the preset number of heat point maps; and zooming the coordinates of each human body joint point to an image coordinate system of the human body image to be detected, and acquiring the human body posture joint points of the human body image to be detected.
Preferably, training the neural network model comprises: marking a human body frame and joint points on an original training image acquired in advance; clipping the original training image according to the human body frame to obtain a clipped image; scaling the cut image according to a preset proportion and filling the cut image to a preset size to obtain a training input image; converting the coordinates of the joint points marked in the original training image into the coordinates in the training input image, and generating a ground truth value by adopting a two-dimensional Gaussian distribution function; and training the neural network model by adopting the training input image and the ground truth value.
Preferably, the size of the training input image is 240 × 192; the size of the group truth value is 60 × 48.
Preferably, the loss function of the neural network model adopts a mean square loss function:
loss(x,y)=(x-y)2
wherein x is the predicted value of the neural network model, and y is the ground truth value.
Further, still include: and in the process of training the neural network model, optimizing the neural network model by adopting an Adam optimization function.
Preferably, the first upsampling layer and the second upsampling layer both adopt 2 times upsampling; the first convolutional neural network layer and the second convolutional neural network layer are both 3 x 3 convolutional neural networks.
Preferably, the pre-trained neural network model is run on a mobile terminal; and the human body image to be detected is acquired by the mobile terminal.
The method for detecting the human body posture provided by the embodiment of the invention abandons the existing complex human body posture detection network model, self-defines a simple and efficient neural network model, and simultaneously, the convolution operation in the neural network model adopts separable convolution operation. The simplification of the network model structure and the use of separable convolution operation greatly reduce the calculation amount of the neural network model, greatly reduce the model per se and make the training process easier. Compared with the prior art, the technical scheme provided by the invention can smoothly run on the mobile terminal with smaller memory and limited CPU and GPU computing capability, and realizes real-time detection of the human body posture.
Drawings
FIG. 1 is a flow chart of a method of an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a neural network model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a first 14-tier network structure of MobileNet V2 in an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a bottleeck network of MobileNet V2 according to an embodiment of the present invention;
FIG. 5 is a visual representation of a hotspot graph in an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.
The invention needs to define a simple and efficient deep neural network model, which needs to be able to operate efficiently on the mobile terminal, so in the embodiment of the invention, in order to be able to efficiently perform forward reasoning, the input height and width of the image is 240 × 192, and the height and width of the output hotspot image is defined as 60 × 48.
At present, many experiments on deep neural network research show that the deeper the network depth, the more high-dimensional specific features can be obtained, the better the network performance is, but the deeper the network depth, the harder the training is, because there is a problem that the gradient vanishing training is not convergent, and the very deep network is not beneficial to operate at the mobile terminal, therefore, the invention can define a simple and efficient CNN (Convolutional Neural Networks) residual network for forward reasoning, because the CNN can extract different features at different network layers, but the sampling is more serious the higher the number of network layers is, and the dimensionality reduction effect can be achieved by adopting a residual network structure, namely, the low-dimensional and high-dimensional features are fused, so that the information contained in the input image under different scales can be repeatedly acquired by the structural design of the multilayer CNN, and a better feature extraction result is obtained.
Training the custom neural network model includes: marking a human body frame and joint points on an original training image acquired in advance; clipping the original training image according to the human body frame to obtain a clipped image; scaling the cut image according to a preset proportion and filling the cut image to a preset size to obtain a training input image; converting the coordinates of the joint points marked in the original training image into coordinates in the training input image, and generating a group truth value by adopting a two-dimensional Gaussian distribution function; and training the neural network model by adopting the training input image and the ground truth value. The trained neural network model can detect the human body posture joint points in the input image. In the above process, some common augmentation operations such as mirroring, rotation, scaling, color information interference of the image (such as enhancing or reducing contrast and saturation), etc. can be performed on the training data, and normalization and regularization operations are performed.
The loss function of the neural network model in the embodiment of the invention adopts a mean square loss function:
loss(x,y)=(x-y)2
wherein x is the predicted value of the neural network model, and y is the ground truth value. I.e. pixel by pixel comparing the squared difference between x and y to define how large the difference between the predicted value and the ground truth value is, the smaller this value the better.
In the process of training the neural network model, an Adam optimization function is adopted to optimize the neural network model, wherein Adam is a first-order optimization algorithm capable of replacing the traditional random gradient descent process and can update the weight of the neural network model iteratively based on training data. The convolution operation in the neural network model adopts separable convolution operation, so that the calculated amount is reduced, and the size of the model is reduced.
If the model needs to be used on the mobile terminal, the model needs to be converted into an ONNX (Open neural network Exchange) format, and then the model in the ONNX format is converted into a network model format corresponding to a mobile terminal operation framework, such as the core ml of Apple, or Caffe2, or a model format supported by other third-party neural network feed-forward inference networks. That is, ONNX is an intermediate model format, and as long as the forward inference framework has tools to support ONNX conversion, the ONNX conversion can be converted into the model format required by the forward inference framework through the conversion tools. The method comprises the steps of obtaining camera data on a mobile terminal of the mobile phone by using a camera API (Application Programming Interface) provided by the mobile terminal, zooming video frame data of the camera to a specified size, and obtaining the condition that only one person exists in the default camera data, so that human body frame detection can be removed, and a large amount of time is saved for human body posture detection. The camera frame data is directly zoomed into the input size required by the network, namely the height and width are 240x192, the image content may be slightly stretched or compressed, but the influence on the neural network with strong robustness is not caused, and the zoomed camera frame data is input into the trained neural network model and converted to the mobile terminal to detect the human posture, so that the predicted hotspot graph is obtained. And processing the predicted hot spot diagram, namely traversing the whole hot spot diagram to obtain the maximum value of the hot spot diagram, namely obtaining the coordinate value of the joint point of the human body posture.
The specific structure of the neural network model defined by the present invention is described below:
as shown in fig. 2, the neural network model includes a first 14 layers of a MobileNetV2 network, a dimension transformation layer, a first upsampling layer, a first convolutional neural network layer, a BN regularization layer, a ReLU activation function layer, a second upsampling layer, and a second convolutional neural network layer, which are connected in sequence. Fig. 3 shows a schematic diagram of the first 14-layer network structure of MobileNetV2, where t represents a multiple of the channel capacity expansion dimension, c is the output channel, and n is the number of repetitions of this bottleeck structure. There are 5 groups of bottleeck structures, the size of the special pattern generated after each group of bottleeck becomes smaller, which embodies the idea that the network lower layer extracts the abstract feature and the network higher layer extracts the more specific feature, and s is the step length adopted by the Filter in the CNN.
Fig. 4 shows a schematic structural diagram of a bottleeck network of MobileNetV2, where a bottleeck is a bottleneck network structure, and its interior is first subjected to dimension-up, then CNN convolution operation, and finally dimension-down, so as to repeatedly extract feature data, and simultaneously determine whether to use shortcut connection according to whether the value of s is equal to that of an input channel and an output channel, where it should be noted that when n > 1, s of a first bottleeck layer of each group is a corresponding s value, s of other repetition layers are 1, and when s is 1, an input dimension is equal to an output dimension, the network has shortcut connection, that is, a residual network concept.
After the input data passes through the first 14 layers of the MobileNetV2 network and the dimension conversion layer, the input data enters an attitude joint point feature extraction and up-sampling network layer, the input feature of the network is the output of the upper layer, and the network firstly performs 2 times up-sampling on the height and width of the feature. For example, the input to the network at this time is (r)2C, H, W), up-sampling by a factor of 2, and outputting (C, rH, rW), where r is up-sampling by a factor of several times, such as up-sampling by a factor of 2, where r is 2, i.e. after the first up-sampling layer PixelShuffle, the number of channels is divided by r2And H and W are enlarged by r times.
After passing through the first up-sampling layer, the data is subjected to feature extraction again through a 3 × 3Conv first convolutional neural network layer, and meanwhile, after passing through a Batch normalization regularization, after passing through BN (Batch normalization), the network enables the data to have more expressive force through a ReLU activation function, and then is connected with sampling again (namely, a second up-sampling layer), the step has the effects of reducing the number of channels and improving and outputting the feature expression of the high-tolerance data more obviously, the second convolutional neural network layer 3 × 3Conv is the final prediction hot spot image output, the number of joint points required to be predicted is set as the data output channel, and the whole construction of the whole human posture joint point network is completed. The convolution operation of the whole network adopts separable convolution operation, namely, each channel is firstly subjected to respective convolution operation, and each channel has a filter, so that after a new channel characteristic diagram is obtained, the new channel characteristic diagram is subjected to standard 1 multiplied by 1 cross-channel convolution operation, and the two steps of operation can reduce the parameter quantity of the original traditional convolution operation to one ninth, thereby greatly reducing the size of the model, and simultaneously, the operation quantity is greatly reduced due to the reduction of the parameters.
To better illustrate the whole network flow, the data flow of the whole network is illustrated here as an example:
input image data input (3 × 240 × 192) which has been crop-scaled, filled and normalized with data augmentation, 3 represents that the image data is 3 channels, 240 represents image height, 192 represents image width, when the characteristic output after the front 14 layer network structure of MobileNetV2 is output (96, 15, 12), when we use a dimension transform layer to extend (96, 15, 12) to the output (512, 15, 12) dimension, which is followed by a 1x1 convolutional layer operation to extend the dimension to 512 and then Batch normalization and ReLU nor 6. The dimension is expanded to increase the expression capability of the feature data and correspond to the input height and width of the subsequent attitude joint feature extraction network. At this point 512 represents 512 channel dimensions, 15 represents feature height, and 12 represents feature width.
The output (512, 15, 12) is input into the pose-joint feature extraction network, and after upsampling by a first upsampling layer PixelShuffle, the output is output as output (128, 30, 24), and it can be seen that the dimension reduction of the channel is performed and the height and width are simultaneously enlarged, and the reason that the upsampling by using the PixelShuffle is the process of enlarging the image from low resolution to high resolution is that interpolation parameters are implicitly contained in the previous convolutional layer, and it can be automatically learned, and the PixelShuffle is only simple to shuffle the pixels, so the efficiency is very high.
The output (128, 30, 24) is input into the subsequent network, after passing through the first convolutional neural network layer 3 × 3Conv, the output dimension of this layer is set to 256, the BN regularization layer, the ReLU activation function layer, and the PixelShuffle upsampling is performed again, the network output is (64, 60, 48). After a 3 × 3 convolution with stride of 1 and padding of 1 again, the final feature hotspot graph output is obtained, that is, the result at this time is output (N, 60, 48), where N is the number of nodes, 60 is the hotspot graph output height defined before, and 48 is the hotspot graph output width defined before. If we define N as 17 at this time, 17 joint points are output, while the output of the second upsampling layer PixelShuffle in this document is 64-channel dimension, and the input of the last 3 × 3 convolutional network layer is 64 channels, the output channel dimension is N, which is defined as 17 joint points in this document, that is, 17 hot-spot map outputs of 60 × 48.
The training process of the neural network model comprises the following steps: and cutting out a corresponding single body frame by using the human body frame and joint point data which are marked in advance, scaling and filling the single body frame to the defined input size, and simultaneously performing data augmentation, normalization and regularization. And converting the marked coordinates of the human body joint points into a finally input coordinate system of an image with the size of 240x192, and generating a ground truth value by using a two-dimensional Gaussian distribution function. One joint point generates a hotspot graph, and if 17 joint points exist, 17 hotspot graphs are generated. The difference between the predicted result and the previous true value is evaluated using the mselos mean square loss function. And simultaneously, gradient updating is carried out by using an Adam optimization algorithm, the weight data of the whole network are updated, the learning rate is set to be 0.001, the training times are 100 epochs, Batch training can be carried out by using Batch during training, if one Batch size is 100, the shape of the corresponding input data is input (100, 3, 240 and 192). More than 80% accuracy can be obtained by using the method on a Coco data set, and meanwhile, the size of the model is only about 6M, which is enough for real-time posture detection of a human body on a mobile terminal. A visualization of a hotspot graph is shown in fig. 5, where the white point represents a corresponding joint point.
After the trained Neural Network model is obtained, the source model can be converted into a target model which can run on a mobile terminal through an Open Neural Network Exchange (ONNX) intermediate model, and the conversion process is the source model- > ONNX- > target model. Such as CoreML model converted onto i0S, or Caffe2 model, or other third party neural network operating framework model. It is noted here that a custom implementation layer needs to be added if the feed forward inference execution framework has unsupported operator. In this embodiment, video frame image data acquired from a camera is directly scaled to 240 × 192, and then input to a network model for feed-forward inference prediction, after 17 hotspot graphs with 60 × 48 are obtained, the 17 hotspot graphs are processed to obtain corresponding human body joint point coordinates, and then each human body joint point coordinate is converted into an image coordinate system where a human body image to be detected is located when the human body joint point is not scaled, so that a human body posture joint point of the human body image to be detected can be acquired.
The method for detecting the human body posture provided by the embodiment of the invention abandons the existing complex human body posture detection network model, self-defines a simple and efficient neural network model, and simultaneously, the convolution operation in the neural network model adopts separable convolution operation. The simplification of the network model structure and the use of separable convolution operation greatly reduce the calculation amount of the neural network model, greatly reduce the model per se, make the training process easier and save time and cost. Compared with the prior art, the technical scheme provided by the invention can smoothly run on the mobile terminal with smaller memory and limited CPU and GPU computing capability, realizes real-time detection of human body posture, and can be further applied to motion sensing games of the mobile terminal, body shaping and slimming, joint point mapping decoration of the human body, or other interesting applications.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims (7)

1. A method of detecting a human pose, comprising:
inputting the preprocessed human body image to be detected into a pre-trained neural network model, and acquiring a predetermined number of hot spot maps, wherein each hot spot map comprises a human body joint point; the neural network model comprises a first 14 layers of a MobileNetV2 network, a dimension transformation layer, a first up-sampling layer, a first convolution neural network layer, a BN regularization layer, a ReLU activation function layer, a second up-sampling layer and a second convolution neural network layer which are connected in sequence; the convolution operations in the neural network model all adopt separable convolution operations;
acquiring a preset number of human body joint point coordinates from the preset number of heat point maps;
and zooming the coordinates of each human body joint point to an image coordinate system of the human body image to be detected, and acquiring the human body posture joint points of the human body image to be detected.
2. The method of detecting human body gestures of claim 1, wherein training the neural network model comprises:
marking a human body frame and joint points on an original training image acquired in advance;
clipping the original training image according to the human body frame to obtain a clipped image;
scaling the cut image according to a preset proportion and filling the cut image to a preset size to obtain a training input image;
converting the coordinates of the joint points marked in the original training image into the coordinates in the training input image, and generating a ground truth value by adopting a two-dimensional Gaussian distribution function;
and training the neural network model by adopting the training input image and the ground truth value.
3. The method of detecting human body posture of claim 2, wherein the size of the training input image is 240x 192; the size of the group truth value is 60 × 48.
4. The method for detecting human body posture as claimed in claim 2, wherein the loss function of the neural network model adopts a mean square loss function:
loss(x,y)=(x-y)2
wherein x is the predicted value of the neural network model, and y is the ground truth value.
5. The method of detecting human body gestures of claim 2, further comprising: and in the process of training the neural network model, optimizing the neural network model by adopting an Adam optimization function.
6. The method for detecting the human body posture as claimed in claim 1, wherein the first upsampling layer and the second upsampling layer adopt 2 times upsampling; the first convolutional neural network layer and the second convolutional neural network layer are both 3 x 3 convolutional neural networks.
7. The method for detecting human body posture as claimed in claim 1, wherein the pre-trained neural network model is run on a mobile terminal; and the human body image to be detected is acquired by the mobile terminal.
CN201811492525.6A 2018-12-06 2018-12-06 Method for detecting human body posture Active CN111291593B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811492525.6A CN111291593B (en) 2018-12-06 2018-12-06 Method for detecting human body posture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811492525.6A CN111291593B (en) 2018-12-06 2018-12-06 Method for detecting human body posture

Publications (2)

Publication Number Publication Date
CN111291593A true CN111291593A (en) 2020-06-16
CN111291593B CN111291593B (en) 2023-04-18

Family

ID=71023035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811492525.6A Active CN111291593B (en) 2018-12-06 2018-12-06 Method for detecting human body posture

Country Status (1)

Country Link
CN (1) CN111291593B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650827A (en) * 2016-12-30 2017-05-10 南京大学 Human body posture estimation method and system based on structure guidance deep learning
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN107704817A (en) * 2017-09-28 2018-02-16 成都品果科技有限公司 A kind of detection algorithm of animal face key point
WO2018058419A1 (en) * 2016-09-29 2018-04-05 中国科学院自动化研究所 Two-dimensional image based human body joint point positioning model construction method, and positioning method
US20180137406A1 (en) * 2016-11-15 2018-05-17 Google Inc. Efficient Convolutional Neural Networks and Techniques to Reduce Associated Computational Costs
CN108062526A (en) * 2017-12-15 2018-05-22 厦门美图之家科技有限公司 A kind of estimation method of human posture and mobile terminal
US20180186452A1 (en) * 2017-01-04 2018-07-05 Beijing Deephi Technology Co., Ltd. Unmanned Aerial Vehicle Interactive Apparatus and Method Based on Deep Learning Posture Estimation
US20180247113A1 (en) * 2016-10-10 2018-08-30 Gyrfalcon Technology Inc. Image Classification Systems Based On CNN Based IC and Light-Weight Classifier
CN108647639A (en) * 2018-05-10 2018-10-12 电子科技大学 Real-time body's skeletal joint point detecting method
CN113947784A (en) * 2021-10-28 2022-01-18 四川长虹电器股份有限公司 Lightweight real-time human body posture estimation method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
WO2018058419A1 (en) * 2016-09-29 2018-04-05 中国科学院自动化研究所 Two-dimensional image based human body joint point positioning model construction method, and positioning method
US20180247113A1 (en) * 2016-10-10 2018-08-30 Gyrfalcon Technology Inc. Image Classification Systems Based On CNN Based IC and Light-Weight Classifier
US20180137406A1 (en) * 2016-11-15 2018-05-17 Google Inc. Efficient Convolutional Neural Networks and Techniques to Reduce Associated Computational Costs
CN106650827A (en) * 2016-12-30 2017-05-10 南京大学 Human body posture estimation method and system based on structure guidance deep learning
US20180186452A1 (en) * 2017-01-04 2018-07-05 Beijing Deephi Technology Co., Ltd. Unmanned Aerial Vehicle Interactive Apparatus and Method Based on Deep Learning Posture Estimation
CN107704817A (en) * 2017-09-28 2018-02-16 成都品果科技有限公司 A kind of detection algorithm of animal face key point
CN108062526A (en) * 2017-12-15 2018-05-22 厦门美图之家科技有限公司 A kind of estimation method of human posture and mobile terminal
CN108647639A (en) * 2018-05-10 2018-10-12 电子科技大学 Real-time body's skeletal joint point detecting method
CN113947784A (en) * 2021-10-28 2022-01-18 四川长虹电器股份有限公司 Lightweight real-time human body posture estimation method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
OSOKIN D: "Real-time 2d multi-person pose estimation on cpu: Lightweight openpose", 《ARXIV PREPRINT ARXIV:1811》 *
谢金衡等: "基于深度残差和特征金字塔网络的实时多人脸关键点定位算法", 《计算机应用》 *
陈鹏飞等: "面向手写汉字识别的残差深度可分离卷积算法", 《软件导刊》 *
韩贵金: "基于改进CNN和加权SVDD算法的人体姿态估计", 《计算机工程与应用》 *

Also Published As

Publication number Publication date
CN111291593B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
US11798132B2 (en) Image inpainting method and apparatus, computer device, and storage medium
CN110188598B (en) Real-time hand posture estimation method based on MobileNet-v2
CN110929569B (en) Face recognition method, device, equipment and storage medium
KR101882704B1 (en) Electronic apparatus and control method thereof
CN112991171B (en) Image processing method, device, electronic equipment and storage medium
CN109544450B (en) Method and device for constructing confrontation generation network and method and device for reconstructing image
CN113159232A (en) Three-dimensional target classification and segmentation method
JP2024004444A (en) Three-dimensional face reconstruction model training, three-dimensional face image generation method, and device
CN112489164A (en) Image coloring method based on improved depth separable convolutional neural network
CN109345604B (en) Picture processing method, computer device and storage medium
CN112686225A (en) Training method of YOLO neural network, pedestrian detection method and related equipment
Steinfeld GAN loci
CN111797834A (en) Text recognition method and device, computer equipment and storage medium
CN114170231A (en) Image semantic segmentation method and device based on convolutional neural network and electronic equipment
CN109658508B (en) Multi-scale detail fusion terrain synthesis method
CN113077477B (en) Image vectorization method and device and terminal equipment
CN108961268B (en) Saliency map calculation method and related device
CN108876704B (en) Method and device for deforming human face image and computer storage medium
CN117593187A (en) Remote sensing image super-resolution reconstruction method based on meta-learning and transducer
CN113313162A (en) Method and system for detecting multi-scale feature fusion target
JP5067882B2 (en) Image processing apparatus, image processing method, and program
CN116385667B (en) Reconstruction method of three-dimensional model, training method and device of texture reconstruction model
CN111291593B (en) Method for detecting human body posture
CN115410182A (en) Human body posture estimation method and device, storage medium and computer equipment
CN112669426B (en) Three-dimensional geographic information model rendering method and system based on generation countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant