WO2021179822A1 - Human body feature point detection method and apparatus, electronic device, and storage medium - Google Patents

Human body feature point detection method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2021179822A1
WO2021179822A1 PCT/CN2021/073863 CN2021073863W WO2021179822A1 WO 2021179822 A1 WO2021179822 A1 WO 2021179822A1 CN 2021073863 W CN2021073863 W CN 2021073863W WO 2021179822 A1 WO2021179822 A1 WO 2021179822A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
human body
detected
feature
feature point
Prior art date
Application number
PCT/CN2021/073863
Other languages
French (fr)
Chinese (zh)
Inventor
吴佳涛
Original Assignee
Oppo广东移动通信有限公司
上海瑾盛通信科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司, 上海瑾盛通信科技有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021179822A1 publication Critical patent/WO2021179822A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Definitions

  • This application relates to the technical field of electronic equipment, and more specifically, to a method, device, electronic equipment, and storage medium for detecting human body feature points.
  • artificial intelligence technology has gradually been applied to the field of detection of human feature points.
  • the detection speed is consistent with the human body in the image.
  • the number shows a linear growth relationship.
  • this application proposes a detection method, device, electronic equipment and storage medium for human body feature points to solve the above-mentioned problems.
  • an embodiment of the present application provides a method for detecting feature points of a human body.
  • the method includes: acquiring an image to be detected; performing down-sampling processing on the image to be detected to obtain a first image of the image to be detected Feature; perform multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected; perform convolution operation on the multiple second image features to obtain the image to be detected
  • the human body feature point location information and the human body feature point connection information is acquiring an image to be detected; performing down-sampling processing on the image to be detected to obtain a first image of the image to be detected Feature; perform multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected; perform convolution operation on the multiple second image features to obtain the image to be detected.
  • an embodiment of the present application provides a device for detecting feature points of a human body.
  • the device includes: a to-be-detected image acquisition module for acquiring the to-be-detected image; a first image feature acquisition module for evaluating the to-be-detected image
  • the detection image is subjected to down-sampling processing to obtain the first image feature of the image to be detected;
  • the second image feature acquisition module is configured to perform multi-scale feature extraction on the first image feature to obtain a plurality of the images to be detected The second image feature;
  • the human body feature point detection module is used to perform a convolution operation on the multiple second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  • an embodiment of the present application provides an electronic device, including a memory and a processor, the memory is coupled to the processor, the memory stores instructions, and the instructions are executed when the instructions are executed by the processor.
  • the processor executes the above method.
  • an embodiment of the present application provides a computer readable storage medium, and the computer readable storage medium stores program code, and the program code can be invoked by a processor to execute the above method.
  • FIG. 1 shows a schematic flowchart of a method for detecting human body feature points according to an embodiment of the present application
  • FIG. 2 shows a schematic flowchart of a method for detecting human body feature points according to another embodiment of the present application
  • FIG. 3 shows a schematic flowchart of step S260 of the method for detecting human body feature points shown in FIG. 2 of the present application;
  • FIG. 4 shows a schematic flowchart of a method for detecting human body feature points according to another embodiment of the present application
  • FIG. 5 shows the overall framework diagram of the detection model provided by the embodiment of the present application.
  • FIG. 6 shows a block diagram of a module of a device for detecting human body feature points provided by an embodiment of the present application
  • Fig. 7 shows a block diagram of an electronic device used in an embodiment of the present application to execute the method for detecting human body feature points according to the embodiment of the present application;
  • Fig. 8 shows a storage unit for storing or carrying program codes that implement the method for detecting human body feature points according to the embodiment of the present application.
  • Convolutional neural network is a kind of neural network that includes convolution calculation and has a certain deep structure. It is one of the representative algorithms of deep learning.
  • the development of convolutional neural networks has generally included the following types of stacked layers: input layer, convolutional layer, pooling layer, normalization layer (also called Batch Norm layer), activation function layer, fully connected layer, output layer Wait.
  • the input layer is generally a RGB three-channel color image;
  • the function of the convolutional layer is to extract features of the input data, and the calculation form is convolution operation, including weight coefficients and bias;
  • the pooling layer is used to extract features Information is selected and filtered.
  • Commonly used pooling methods include maximum pooling and average pooling; the normalization layer normalizes the input data so that the distribution of each feature is similar, and the network is easier to train; the activation function layer is used for Add nonlinear factors to the model to make the model have a stronger fitting ability; the fully connected layer is generally located in the last part of the convolutional neural network, and the input features are nonlinearly combined to obtain the output; the output layer outputs the type of results required by the model,
  • the output layer uses softmax (normalized exponential function, often used as an output layer in the field of deep learning to obtain a specified type of output) and other functions to output classification labels.
  • the output layer directly outputs each pixel
  • the classification results of the human body feature point detection problem the output layer outputs the human body feature point heat map (different algorithm models may also output other heat maps for auxiliary feature point detection and allocation).
  • Human feature point detection mainly detects some feature points of the human body, such as eyes, nose, elbows, shoulders, etc., and connects them in sequence in the order of feature points, and describes human body information through feature points. Expanded, it can also describe the posture, gait, behavior and other information of the human body.
  • Human feature point detection is one of the basic algorithms of computer vision, and it has played a basic role in the research of other related fields of computer vision, such as behavior recognition, intelligent composition and other related fields.
  • Existing human feature point detection algorithms based on deep learning can be divided into two directions, namely, a top-down detection method and a bottom-up detection method.
  • the top-down human feature point detection algorithm divides the human feature point detection task into two parts: human body detection and single-person human feature point detection, that is, each person in the image is detected individually through the target detection algorithm. Then, on the basis of the detection frame, the human body feature point detection is performed for a single person.
  • the top-down method tends to have higher detection accuracy, but the detection speed of this method has a linear growth relationship with the number of people in the image, and additional target detection algorithms are needed as support.
  • the bottom-up method also includes two parts: multi-person feature point detection in the image and post-processing, that is, firstly, all feature points in the image need to be detected, and then related strategies are applied in the post-processing module to remove all feature points.
  • the feature points of are assigned to different individuals, and representative algorithms include Openpose, PersonLab, etc.
  • the detection accuracy of the bottom-up method is lower than that of the top-down method, but the detection speed is faster, and the detection time has nothing to do with the number of people in the image.
  • the post-processing module is often composed of some logic strategies, such as greedy algorithms.
  • the Openpose algorithm in addition to detecting the distribution heat map of feature points (also called heatmap), the Openpose algorithm also proposes a heat map representing the connection information of feature points: pafmap.
  • the position with high confidence in the heat map indicates that there is a feature point connection at that location. The probability is also high.
  • the heatmap and pafmap are used as the output of the algorithm model, and the greedy algorithm is used as the post-processing strategy to realize the assignment of multi-person feature points to independent character instances.
  • the method has undergone two versions of evolution. In the first version released, the model structure is divided into a basic network and a heat map detection network.
  • the heat map detection network contains multiple stages, and each stage is divided into two upper and lower stages.
  • the algorithm model also has the disadvantages of high model complexity and large amount of calculation.
  • the stacking of multiple stages does not significantly improve the accuracy of the model, but it brings a lot of redundant calculations.
  • the second version has a single branch structure.
  • the 3*3 residual connection method in the middle can increase the sensing field information, it brings a very small increase in accuracy, and at the same time causes a lot of waste of calculations.
  • the inventor has discovered through long-term research and proposed the method, device, electronic equipment and storage medium for detecting human body feature points provided by the embodiments of this application.
  • the multi-scale feature extraction of the image to be detected is used to obtain different information.
  • the position information of the human body feature points and the connection information of the human body feature points are obtained based on the image features at different scales, thereby greatly improving the accuracy and efficiency of the detection of human body feature points.
  • the specific detection method of human body feature points will be described in detail in the subsequent embodiments.
  • the electronic device applied in this embodiment can be a mobile terminal, a smart phone, a tablet computer, a wearable electronic device, etc. Make a limit.
  • the process shown in FIG. 1 will be described in detail below.
  • the method for detecting human feature points may specifically include the following steps:
  • Step S110 Obtain an image to be detected.
  • Step S120 Perform down-sampling processing on the image to be detected to obtain a first image feature of the image to be detected.
  • the image to be detected may be down-sampled to obtain the first image feature of the image to be detected.
  • the image to be detected may be sequentially subjected to 2 times down-sampling processing until the obtained first image feature of the image to be detected meets the processing requirements.
  • the image to be detected may be sequentially subjected to 2 times down-sampling processing, a total of 4 times That is, 16 times down-sampling processing is performed on the image to be detected, so that the first image feature of the image to be detected includes sufficient abstract features without causing excessive feature extraction to meet processing requirements.
  • the image to be detected can be down-sampled twice, and then the image features obtained by the down-sampling process can be down-sampled by a factor of 4, and then down-sampling the image feature obtained by the down-sampling process by 4 times.
  • the acquired image features are subjected to 8-fold down-sampling processing, and then the image features acquired through the 8-fold down-sampling processing are subjected to 16-fold down-sampling processing to obtain the first image feature of the image to be detected.
  • the image to be detected can also be down-sampled by more times.
  • the image to be detected can also be down-sampled by 32 times and 64-fold, which is not limited here.
  • Step S130 Perform multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected.
  • the down-sampling processing of the image to be detected is to sequentially perform 2 times down-sampling processing on the image to be detected, specifically, after the image to be detected is subjected to 2 times down-sampling processing, the image features obtained by the 2 times down-sampling processing are then performed Perform 4 times downsampling processing, etc., that is, the above downsampling is processed in a serial manner.
  • the input of a certain convolutional layer can only be the output of the previous convolutional layer, which means that the convolutional layer can learn
  • the feature information in the image feature can only be the single receptive field information represented by the output of the previous convolutional layer, that is, the scale and receptive field of the first image feature of the image to be detected obtained through down-sampling processing are relatively simple.
  • Step S140 Perform a convolution operation on the multiple second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  • convolution operations may be performed on the multiple second image features to obtain the human body feature point position information (heatmap) and human body features of the image to be detected Click the connection information (pafmap).
  • the multiple second image features can be divided into two branches for convolution operation, where one branch performs the convolution operation on the multiple second image features.
  • the convolution operation outputs human body feature point position information, and the other branch performs a convolution operation on multiple second image features to output human body feature point connection information.
  • the method for detecting feature points of a human body is to obtain an image to be detected, perform down-sampling processing on the image to be detected, obtain the first image feature of the image to be detected, and perform multi-scale feature extraction on the first image feature to obtain the image to be detected.
  • Detect multiple second image features of the image at different scales and different receptive fields and perform convolution operations on multiple second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected, so as to pass Perform multi-scale feature extraction on the image to be detected to obtain image features at different scales and different receptive fields, and obtain human feature point position information and human feature point connection information based on image features at different scales and different receptive fields. Increase the accuracy and efficiency of human feature point detection.
  • FIG. 2 shows a schematic flowchart of a method for detecting human body feature points according to another embodiment of the present application.
  • the process shown in FIG. 2 will be described in detail below.
  • the method for detecting human feature points may specifically include the following steps:
  • Step S210 Obtain an image to be detected.
  • step S210 For the specific description of step S210, please refer to step S110, which will not be repeated here.
  • the image to be detected may be down-sampled by N1 times to obtain the features of the image to be processed.
  • the image size after the N1 times downsampling of the image to be detected is often smaller. If you directly downsample the N1 times Multi-scale feature extraction is performed on the image features to be processed after sampling processing. Then, when convolution with a large convolution kernel is performed, it is easy to cause excessive extraction of image features and introduce too much unnecessary redundant information. For example, in order to obtain more abstract features of the image to be detected, the image to be detected is generally downsampled by 16 times. Correspondingly, the image size after the image to be detected is downsampled by 16 times will be smaller. The first feature image to be processed after the down-sampling process is subjected to multi-scale feature extraction. Then, when performing 7*7 convolution, it will easily cause excessive extraction of image features and introduce unnecessary redundant information.
  • the image features to be processed can also be subjected to N2 times upsampling processing to determine the newly acquired image features as the image to be detected.
  • the first image feature to avoid excessive extraction of image features and the introduction of unnecessary redundant information.
  • performing N1 times downsampling processing on the image to be detected may be 16 times downsampling processing for the image to be detected, and performing N2 times upsampling processing on the image features to be processed may be 2 times upsampling processing.
  • Step S240 Perform multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected.
  • step S240 please refer to step S130, which will not be repeated here.
  • Step S250 Perform down-sampling processing on the image to be detected, and obtain a third image feature of the image to be detected.
  • the down-sampling process participates in the convolution operation in this method.
  • This method can not only further increase the feature scale information and receptive field, but also increase the shallow accurate pixel position information, and improve the acquisition accuracy of the human body feature point position information and the body feature point connection information .
  • the multiple second image features obtained by performing multi-scale feature extraction on the first image feature of the image to be detected are abstract features of the image to be detected
  • the third image feature obtained by down-sampling the image to be detected is the image to be detected.
  • the shallow features of the image that is, multiple second image features and third image features have different scales and different receptive fields. Therefore, the third image feature is involved in the convolution operation to obtain the position information of the human body feature points and the connection of the human body feature points In information, the scale and receptive field of the data can be increased. Furthermore, since the third image feature is a shallow image feature, and the pixel position information of the shallow image feature is more accurate, it can improve the acquired human feature point position information and The accuracy of the connection information of the feature points of the human body.
  • the image to be detected can also be down-sampled to obtain the third image feature of the image to be detected, and the third image feature is involved in the convolution operation.
  • the feature extraction of the image to be detected may be performed through a convolutional layer, which is not limited herein.
  • the channel connection between the first image feature and the third image feature is to be performed, it is necessary to ensure that the image size corresponding to the first image feature and the image size corresponding to the third image feature are one foot.
  • the first image feature is obtained by downsampling 16 times on the image to be detected
  • the third image feature also needs to be obtained by downsampling 16 times. If the first image feature is obtained by down sampling 8 times on the image to be detected If it is obtained, the third image feature also needs to be obtained through 8-fold down-sampling.
  • the image to be detected may be down-sampled by N3 times to obtain the third image feature of the image to be detected.
  • the N3 times downsampling process of the image to be detected can be the 2 M1-M2 times downsampling process for the image to be detected, so that the image size corresponding to the third image feature obtained by the N3 times downsampling process of the image to be detected can be compared with the first
  • the image size corresponding to one image feature is consistent, so as to provide a connection basis when a plurality of second to-be-processed image features and third image features are subsequently channel-connected.
  • Step S260 Perform a convolution operation on the plurality of second image features and the third image feature to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  • a convolution operation may be performed on the multiple second image features and third image features to obtain the human body of the image to be detected Feature point location information and body feature point connection information.
  • the multiple second image features and third image features may be divided into two branches for convolution operation, where: One branch performs convolution operations on multiple second image features and third image features to output human feature point position information, and the other branch performs convolution operations on multiple second image features and third image features to output human feature point connection information .
  • FIG. 3 shows a schematic flowchart of step S260 of the method for detecting human body feature points shown in FIG. 2 of the present application.
  • the following will elaborate on the process shown in FIG. 3, and the method may specifically include the following steps:
  • Step S261 Channel connecting the plurality of second image features and the third image feature to obtain a fourth image feature.
  • multiple second image features and third image features can be channel-connected to obtain the fourth image feature, and the fourth image feature is involved in the convolution operation to obtain the total human body feature of the image to be detected Point location information and body feature point connection information.
  • Step S262 Perform a convolution operation on the fourth image feature to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  • a convolution operation may be performed on the fourth image feature to obtain the position information of the human body feature point and the connection information of the human body feature point of the image to be detected.
  • the fourth image feature can be divided into two branches for convolution operation, and one branch performs convolution operation on the fourth image feature to output the human body feature Point position information, another branch performs a convolution operation on the fourth image feature to output the human body feature point connection information.
  • a method for detecting feature points of a human body is provided, an image to be detected is obtained, the image to be detected is subjected to N1 times down-sampling processing to obtain the image features to be processed, and the image to be processed is subjected to N2 times upsampling processing to obtain the to-be-detected image features
  • the first image feature of the image, multi-scale feature extraction is performed on the first image feature, and multiple second image features of the image to be detected under different scales and different receptive fields are obtained.
  • Feature extraction is performed on the image to be detected to obtain the image of the image to be detected.
  • a convolution operation is performed on a plurality of second image features and a third image feature to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  • this embodiment also performs N1 times downsampling processing on the image to be detected, and then performs N2 times upsampling processing to obtain the first image features to obtain more abstractions. At the same time, it can avoid the excessive extraction of image features and the introduction of unnecessary redundant information.
  • this embodiment also performs a convolution operation based on a plurality of second image features and a third image feature extracted based on the image to be detected, so as to increase the receptive field of the image.
  • FIG. 4 shows a schematic flowchart of a method for detecting human body feature points according to another embodiment of the present application.
  • the following will elaborate on the process shown in FIG. 4, and the method for detecting human feature points may specifically include the following steps:
  • Step S310 Obtain an image to be detected.
  • step S310 For the specific description of step S310, please refer to step S110, which will not be repeated here.
  • Step S320 Perform down-sampling processing on the image to be detected to obtain a first image feature of the image to be detected.
  • a trained detection model may be used to process the acquired image to be detected, so as to output the human body feature point position information and the human body feature point connection information of the to be detected image.
  • FIG. 5 shows the overall framework diagram of the detection model provided by the embodiment of the present application.
  • the detection model may include three main parts: a basic network module F, a multi-scale module M, and a heat map detection module. S.
  • the image to be detected can be input to the basic network module in the detection model, and the image to be detected is down-sampled through the basic network module to obtain the first image feature of the image to be detected, and the The first image feature is used as the input of the multi-scale module in the detection model.
  • the basic network module may include: Vgg, ResNet, Mobilenet, and other convolutional neural networks. If a deeper network model such as Vgg, ResNet is used, the computational complexity of the model will be increased, but higher If a lightweight network model such as Mobilenet is used, a certain detection accuracy will be lost, but a faster detection speed can be obtained, and complete real-time detection can be achieved.
  • Step S330 Input the first image feature into the multi-scale module of the detection model, and perform multi-scale feature extraction on the first image feature through the multi-scale module to obtain multiple second image features of the image to be detected .
  • the first image feature after obtaining the first image feature output by the basic network module, can be input to the multi-scale module of the detection model to perform multi-scale feature extraction on the first image feature through the multi-scale module , To obtain multiple second image features of the image to be detected.
  • the multi-scale module includes multiple convolutional layers in parallel, and the convolution kernel of each convolutional layer in the multiple convolutional layers is different, and each convolutional layer is used to obtain data from the first image. Extract the second image features of different scales and different receptive fields from the features.
  • the multi-scale module can include 4 parallel convolutional layers, in order: 1*1 convolution, 3*3 convolution, 5*5 convolution, and 7*7 convolution, each convolution
  • the size of the convolution kernel of the layers increases sequentially, and is responsible for extracting image information of different scales and receptive fields.
  • the four parallel convolutional layers together form the multi-scale module.
  • Step S340 Input the plurality of second image features into the heat map detection module of the detection model, and perform a convolution operation on the plurality of second image features through the heat map detection module to obtain the output of the heat map detection module
  • the human body feature point location information and the human body feature point connection information are included in Step S340.
  • multiple second image features can be input to the heat map detection module of the detection model, so that the multiple second image features can be detected by the heat map detection module.
  • Two image features are subjected to convolution operation to obtain the position information of the human body feature points and the connection information of the human body feature points.
  • the third image feature output by the basic network module can also be obtained, and then multiple second image features and third image features can be channel-connected to obtain the fourth image feature, and then input into the heat map of the detection model
  • the detection module is configured to perform a convolution operation on the fourth image feature through the heat map detection module to obtain the human body feature point position information and the human body feature point connection information.
  • the heat map detection module includes only one convolution stage.
  • the one convolution stage includes a first processing branch and a second processing branch.
  • the first processing branch is used to detect and output human body feature point position information.
  • the second processing branch is used to detect and output the connection information of the human body feature points.
  • the first processing branch includes two convolutional layers
  • the second processing branch includes two convolutional layers.
  • the heat map detection module is serially connected by multiple stages to improve accuracy, but experiments have shown that neither heatmap detection nor pafmap detection does not require too many stages for correction.
  • the concatenation of the stages not only brings a very limited increase in accuracy, but also brings a huge amount of parameters and calculations.
  • a multi-scale module is added, so that the image feature information input to the heat map detection module already contains very rich image feature information and scale information, which further makes it possible for the heat map detection module to reduce the number of stages, and only use A stage is enough to achieve high accuracy, and it can also greatly reduce the amount of calculations and parameters of the model, so that the model can be detected in real time on the mobile terminal.
  • the heat map detection module contains only one stage. In order to further reduce the amount of parameters and calculations, only two convolutional layers are used in each branch of the stage: a 3*3 convolution is responsible for the input The channel connection image feature is used for further feature extraction, and another 1*1 convolution is responsible for detecting the position information of the human body feature point/the connection information of the human body feature point, and output the feature map corresponding to the number of channels.
  • the embodiment of the present application may further include training and correction of the detection model, wherein the training of the detection model may be performed in advance according to the acquired training data set, and subsequently Each time the detection is performed, the detection can be performed according to the detection model, and there is no need to train the detection model each time the detection is performed.
  • an objective function can be set, which is used to measure the difference between the detection result of the detection model and the real label.
  • This function is called a loss function, also called a loss function.
  • the feature point location heat map loss is used to measure the loss between the detected feature point location heat map and the real feature point location heat map:
  • Feature point connection heat map loss is used to measure the loss between the detected feature point connection heat map and the real feature point connection heat map:
  • a method for detecting feature points of a human body is provided, an image to be detected is obtained, the image to be detected is down-sampled, the first image feature of the image to be detected is obtained, and the first image feature is input to the multi-scale module of the detection model , Perform multi-scale feature extraction on the first image feature through the multi-scale module to obtain multiple second image features of the image to be detected, input multiple second image features into the heat map detection module of the detection model, and use the heat map detection module to A convolution operation is performed on a plurality of second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  • this embodiment also detects the human body feature points of the image to be detected through the detection model, so as to improve the accuracy of detecting the human body feature points.
  • FIG. 6 shows a block diagram of a human body feature point detection apparatus 200 provided by an embodiment of the present application.
  • the human body feature point detection device 200 includes: a to-be-detected image acquisition module 210, a first image feature acquisition module 220, a second image feature acquisition module 230, and human body feature point detection Module 240, where:
  • the to-be-detected image acquisition module 210 is used to acquire the to-be-detected image.
  • the first image feature acquisition module 220 is configured to perform down-sampling processing on the image to be detected to obtain the first image feature of the image to be detected.
  • the first image feature acquisition module 220 includes: a to-be-processed image feature acquisition sub-module and a first image feature acquisition sub-module, wherein:
  • the second image feature acquisition module 230 is configured to perform multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected.
  • the second image feature acquisition module 230 includes: a second image feature acquisition sub-module, wherein:
  • the human body feature point detection module 240 is configured to perform a convolution operation on the multiple second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  • the human body feature point detection module 240 includes: a third feature image acquisition sub-module and a first human body feature point detection sub-module, wherein:
  • the third feature image acquisition sub-module is configured to perform feature extraction on the image to be detected, and obtain the third image feature of the image to be detected.
  • the third characteristic image acquisition sub-module includes: a third characteristic image acquisition unit, wherein:
  • the first human body feature point detection sub-module is used to perform convolution operations on the plurality of second image features and the third image feature to obtain the human body feature point position information in the image to be detected and the Human body feature point connection information.
  • the fourth image feature obtaining unit is configured to channel-connect the plurality of second image features and the third image feature to obtain a fourth image feature.
  • the human body feature point detection unit is configured to perform a convolution operation on the fourth image feature to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  • the human body feature point detection module 240 includes: a second human body feature point detection sub-module, wherein:
  • the second human body feature point detection sub-module is used to input the multiple second image features into the heat map detection module of the detection model, and perform convolution operations on the multiple second image features through the heat map detection module, Obtain the human body feature point location information and the human body feature point connection information output by the heat map detection module.
  • the training data set acquisition module is used to acquire a training data set, the training data set includes a plurality of images, and the human body feature point position information and the human body feature point connection information corresponding to each of the multiple images.
  • the model training module is configured to use each image as input data based on the training data set, and the human body feature point position information and human body feature point connection information corresponding to each image as output data, and train through machine learning algorithms , To obtain the trained detection model.
  • the coupling between the modules may be electrical, mechanical or other forms of coupling.
  • the functional modules in the various embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.
  • FIG. 7 shows a structural block diagram of an electronic device 100 provided by an embodiment of the present application.
  • the electronic device 100 may be an electronic device capable of running application programs, such as a smart phone, a tablet computer, or an e-book.
  • the electronic device 100 in this application may include one or more of the following components: a processor 110, a memory 120, and one or more application programs, where one or more application programs may be stored in the memory 120 and configured to be composed of one Or multiple processors 110 execute, and one or more programs are configured to execute the method described in the foregoing method embodiment.
  • the processor 110 may include one or more processing cores.
  • the processor 110 uses various interfaces and lines to connect various parts of the entire electronic device 100, and executes by running or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and calling data stored in the memory 120.
  • Various functions and processing data of the electronic device 100 may adopt at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA).
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PDA Programmable Logic Array
  • the processor 110 may be integrated with one or a combination of a central processing unit (CPU), a graphics processing unit (GPU), a modem, and the like.
  • the CPU mainly processes the operating system, user interface, and application programs; the GPU is used for rendering and drawing the content to be displayed; the modem is used for processing wireless communication. It can be understood that the above-mentioned modem may not be integrated into the processor 110, but may be implemented by a communication chip alone.
  • the memory 120 may include random access memory (RAM) or read-only memory (Read-Only Memory).
  • the memory 120 may be used to store instructions, programs, codes, code sets or instruction sets.
  • the memory 120 may include a program storage area and a data storage area, where the program storage area may store instructions for implementing the operating system and instructions for implementing at least one function (such as touch function, sound playback function, image playback function, etc.) , Instructions used to implement the following various method embodiments, etc.
  • the data storage area can also store data created during use of the mobile terminal 100 (such as phone book, audio and video data, chat record data) and the like.
  • FIG. 8 shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application.
  • the computer-readable medium 300 stores program code, and the program code can be invoked by a processor to execute the method described in the foregoing method embodiment.
  • the computer-readable storage medium 300 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the computer-readable storage medium 300 includes a non-transitory computer-readable storage medium.
  • the computer-readable storage medium 300 has storage space for the program code 310 for executing any method steps in the above-mentioned methods. These program codes can be read from or written into one or more computer program products.
  • the program code 310 may be compressed in a suitable form, for example.
  • the human body feature point detection method, device, electronic device, and storage medium acquire the image to be detected, perform down-sampling processing on the image to be detected, and obtain the first image feature of the image to be detected.
  • Perform multi-scale feature extraction on the first image feature obtain multiple second image features of the image to be detected, perform convolution operation on multiple second image features, and obtain the position information of the human body feature points in the image to be detected and the connection of the human body feature points Information, through the multi-scale feature extraction of the image to be detected, to obtain image features at different scales, and obtain human feature point position information and human feature point connection information based on image features at different scales, thereby greatly improving human body features Accuracy and efficiency of point detection.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The present application relates to the technical field of electronic devices, and disclosed are a human body feature point detection method and apparatus, an electronic device, and a storage medium. The method comprises: obtaining an image to be detected; performing down-sampling processing on said image to obtain first image features of said image; performing multi-scale feature extraction on the first image features to obtain a plurality of second image features of said image; and performing convolution operation on the plurality of second image features to obtain human body feature point position information and human body feature point connection information of said image. According to the present application, multi-scale feature extraction is performed on said image to obtain image features under different scales, and the human body feature point position information and the human body feature point connection information are obtained on the basis of the image features under different scales, thereby greatly improving the accuracy and efficiency of human body feature point detection.

Description

人体特征点的检测方法、装置、电子设备以及存储介质Detection method, device, electronic equipment and storage medium of human body feature points
相关申请的交叉引用Cross-references to related applications
本申请要求于2020年03月12日提交的申请号为CN202010171918.8的中国申请的优先权,其在此出于所有目的通过引用将其全部内容并入本文。This application claims the priority of the Chinese application with the application number CN202010171918.8 filed on March 12, 2020, which is hereby incorporated in its entirety by reference for all purposes.
技术领域Technical field
本申请涉及电子设备技术领域,更具体地,涉及一种人体特征点的检测方法、装置、电子设备以及存储介质。This application relates to the technical field of electronic equipment, and more specifically, to a method, device, electronic equipment, and storage medium for detecting human body feature points.
背景技术Background technique
随着人工智能技术的不断发展,人工智能技术也逐渐被应用到人体特征点的检测领域。目前,在通过人工智能技术对图像中的人体特征点进行检测时,需要先用目标检测算法检测出图像中的人体,再对检测出的人体进行人体特征点检测,检测速度与图像中人体的个数呈线性增长关系。With the continuous development of artificial intelligence technology, artificial intelligence technology has gradually been applied to the field of detection of human feature points. At present, when using artificial intelligence technology to detect the human body feature points in the image, it is necessary to first detect the human body in the image with the target detection algorithm, and then perform the human body feature point detection on the detected human body. The detection speed is consistent with the human body in the image. The number shows a linear growth relationship.
发明内容Summary of the invention
鉴于上述问题,本申请提出了一种人体特征点的检测方法、装置、电子设备以及存储介质,以解决上述问题。In view of the above-mentioned problems, this application proposes a detection method, device, electronic equipment and storage medium for human body feature points to solve the above-mentioned problems.
第一方面,本申请实施例提供了一种人体特征点的检测方法,所述方法包括:获取待检测图像;对所述待检测图像进行下采样处理,获得所述待检测图像的第一图像特征;对所述第一图像特征进行多尺度特征提取,获得所述待检测图像的多个第二图像特征;对所述多个第二图像特征进行卷积运算,获得所述待检测图像中的人体特征点位置信息和人体特征点连接信息。In the first aspect, an embodiment of the present application provides a method for detecting feature points of a human body. The method includes: acquiring an image to be detected; performing down-sampling processing on the image to be detected to obtain a first image of the image to be detected Feature; perform multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected; perform convolution operation on the multiple second image features to obtain the image to be detected The human body feature point location information and the human body feature point connection information.
第二方面,本申请实施例提供了一种人体特征点的检测装置,所述装置包括:待检测图像获取模块,用于获取待检测图像;第一图像特征获取模块,用于对所述待检测图像进行下采样处理,获得所述待检测图像的第一图像特征;第二图像特征获取模块,用于对所述第一图像特征进行多尺度特征提取,获得所述待检测图像的多个第二图像特征;人体特征点检测模块,用于对所述多个第二图像特征进行卷积运算,获得所述待检测图像中的人体特征点位置信息和人体特征点连接信息。In a second aspect, an embodiment of the present application provides a device for detecting feature points of a human body. The device includes: a to-be-detected image acquisition module for acquiring the to-be-detected image; a first image feature acquisition module for evaluating the to-be-detected image The detection image is subjected to down-sampling processing to obtain the first image feature of the image to be detected; the second image feature acquisition module is configured to perform multi-scale feature extraction on the first image feature to obtain a plurality of the images to be detected The second image feature; the human body feature point detection module is used to perform a convolution operation on the multiple second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
第三方面,本申请实施例提供了一种电子设备,包括存储器和处理器,所述存储器耦接到所述处理器,所述存储器存储指令,当所述指令由所述处理器执行时所述处理器执行上述方法。In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, the memory is coupled to the processor, the memory stores instructions, and the instructions are executed when the instructions are executed by the processor. The processor executes the above method.
第四方面,本申请实施例提供了一种计算机可读取存储介质,所述计算机可读取存储介质中存储有程序代码,所述程序代码可被处理器调用执行上述方法。In a fourth aspect, an embodiment of the present application provides a computer readable storage medium, and the computer readable storage medium stores program code, and the program code can be invoked by a processor to execute the above method.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can be obtained from these drawings without creative work.
图1示出了本申请一个实施例提供的人体特征点的检测方法的流程示意图;FIG. 1 shows a schematic flowchart of a method for detecting human body feature points according to an embodiment of the present application;
图2示出了本申请又一个实施例提供的人体特征点的检测方法的流程示意图;FIG. 2 shows a schematic flowchart of a method for detecting human body feature points according to another embodiment of the present application;
图3示出了本申请的图2所示的人体特征点的检测方法的步骤S260的流程示意图;FIG. 3 shows a schematic flowchart of step S260 of the method for detecting human body feature points shown in FIG. 2 of the present application;
图4示出了本申请再一个实施例提供的人体特征点的检测方法的流程示意图;FIG. 4 shows a schematic flowchart of a method for detecting human body feature points according to another embodiment of the present application;
图5示出了本申请实施例提供的检测模型的整体框架图;FIG. 5 shows the overall framework diagram of the detection model provided by the embodiment of the present application;
图6示出了本申请实施例提供的人体特征点的检测装置的模块框图;FIG. 6 shows a block diagram of a module of a device for detecting human body feature points provided by an embodiment of the present application;
图7示出了本申请实施例用于执行根据本申请实施例的人体特征点的检测方法的电 子设备的框图;Fig. 7 shows a block diagram of an electronic device used in an embodiment of the present application to execute the method for detecting human body feature points according to the embodiment of the present application;
图8示出了本申请实施例的用于保存或者携带实现根据本申请实施例的人体特征点的检测方法的程序代码的存储单元。Fig. 8 shows a storage unit for storing or carrying program codes that implement the method for detecting human body feature points according to the embodiment of the present application.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to enable those skilled in the art to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application.
卷积神经网络是一类包含卷积计算且具有一定深度结构的神经网络,是深度学习的代表算法之一。卷积神经网络发展至今,一般包含如下几种类型的堆叠层:输入层、卷积层、池化层、归一化层(又叫Batch Norm层)、激活函数层、全连接层、输出层等。在计算机视觉领域,输入层一般是RGB三通道的彩色图像;卷积层的功能是对输入数据进行特征提取,计算形式为卷积运算,包含权重系数和偏置;池化层用于对特征信息进行选择和过滤,常用的池化方式包括最大池化和平均池化;归一化层对输入数据进行归一化处理,使各个特征的分布相近,网络更容易训练;激活函数层用于给模型增加非线性因素,使得模型具有更强的拟合能力;全连接层一般位于卷积神经网络的最后部分,对输入特征进行非线性组合得到输出;输出层输出模型所需类型的结果,对图像分类问题,输出层使用softmax(归一化指数函数,在深度学习领域常用作输出层,得到指定类型的输出)等函数输出分类标签,对图像语义分割问题,输出层直接输出每个像素的分类结果,对人体特征点检测问题,输出层输出人体特征点热图(不同算法模型可能还会输出其他热图用于辅助特征点检测和分配)。Convolutional neural network is a kind of neural network that includes convolution calculation and has a certain deep structure. It is one of the representative algorithms of deep learning. The development of convolutional neural networks has generally included the following types of stacked layers: input layer, convolutional layer, pooling layer, normalization layer (also called Batch Norm layer), activation function layer, fully connected layer, output layer Wait. In the field of computer vision, the input layer is generally a RGB three-channel color image; the function of the convolutional layer is to extract features of the input data, and the calculation form is convolution operation, including weight coefficients and bias; the pooling layer is used to extract features Information is selected and filtered. Commonly used pooling methods include maximum pooling and average pooling; the normalization layer normalizes the input data so that the distribution of each feature is similar, and the network is easier to train; the activation function layer is used for Add nonlinear factors to the model to make the model have a stronger fitting ability; the fully connected layer is generally located in the last part of the convolutional neural network, and the input features are nonlinearly combined to obtain the output; the output layer outputs the type of results required by the model, For image classification problems, the output layer uses softmax (normalized exponential function, often used as an output layer in the field of deep learning to obtain a specified type of output) and other functions to output classification labels. For image semantic segmentation problems, the output layer directly outputs each pixel The classification results of the human body feature point detection problem, the output layer outputs the human body feature point heat map (different algorithm models may also output other heat maps for auxiliary feature point detection and allocation).
人体特征点检测,即pose estimation,主要检测人体的一些特征点,如眼睛、鼻子、手肘、肩膀等,并将它们按照特征点顺序依次连接,通过特征点来描述人体信息。扩展开来,还可以描述人体的姿态、步态、行为等信息。人体特征点检测是计算机视觉的基础性算法之一,在计算机视觉的其他相关领域的研究中都起到了基础性的作用,如行为识别、智能构图等相关领域。现有的基于深度学习的人体特征点检测算法可分为两个方向,即自上而下(Top-Down)的检测方法和自下而上(Bottom-Up)的检测方法。Human feature point detection, namely pose estimation, mainly detects some feature points of the human body, such as eyes, nose, elbows, shoulders, etc., and connects them in sequence in the order of feature points, and describes human body information through feature points. Expanded, it can also describe the posture, gait, behavior and other information of the human body. Human feature point detection is one of the basic algorithms of computer vision, and it has played a basic role in the research of other related fields of computer vision, such as behavior recognition, intelligent composition and other related fields. Existing human feature point detection algorithms based on deep learning can be divided into two directions, namely, a top-down detection method and a bottom-up detection method.
其中,自上而下的人体特征点检测算法将人体特征点检测任务分为两部分执行:人体检测和单人人体特征点检测,即首先通过目标检测算法将图像中每一个人单独检测出来,然后在检测框的基础上针对单个人做人体特征点检测。自上而下的方法往往检测精度更高,但该方法的检测速度与图像中人物个数呈线性增长的关系,而且还需要额外的目标检测算法作为支撑。Among them, the top-down human feature point detection algorithm divides the human feature point detection task into two parts: human body detection and single-person human feature point detection, that is, each person in the image is detected individually through the target detection algorithm. Then, on the basis of the detection frame, the human body feature point detection is performed for a single person. The top-down method tends to have higher detection accuracy, but the detection speed of this method has a linear growth relationship with the number of people in the image, and additional target detection algorithms are needed as support.
其中,自下而上的方法也包含两个部分:图像中多人特征点检测和后处理,即首先需要将图片中所有的特征点都检测出来,然后在后处理模块中应用相关策略将所有的特征点分配到不同的人物个体上,代表性算法有Openpose,PersonLab等。自下而上的方法检测精度要低于自上而下的方法,但检测速度较快,检测时间与图像中的人物个数无关,后处理模块往往由一些逻辑策略构成,如贪心算法。Among them, the bottom-up method also includes two parts: multi-person feature point detection in the image and post-processing, that is, firstly, all feature points in the image need to be detected, and then related strategies are applied in the post-processing module to remove all feature points. The feature points of are assigned to different individuals, and representative algorithms include Openpose, PersonLab, etc. The detection accuracy of the bottom-up method is lower than that of the top-down method, but the detection speed is faster, and the detection time has nothing to do with the number of people in the image. The post-processing module is often composed of some logic strategies, such as greedy algorithms.
其中,Openpose算法除了检测特征点的分布热图(又叫heatmap),还提出一种代表特征点连接信息的热图:pafmap,该热图中置信度高的位置表示该位置处有特征点连接的概率也高。以heatmap和pafmap作为算法模型的输出,再配以贪心算法作为后处理策略,实现将多人特征点分配到独立的人物实例中。该方法经历了两版演变,在发布的第一个版本中,模型结构分为基础网络和热图检测网络,其中热图检测网络包含多个阶段(stage),每个stage分为上下两条分支,每条分支的网络结构完全相同,但负责学习不同的图像信息:一条负责学习特征点分布热图heatmap,一条负责学习特征点连接的分布热图pafmap。下一个stage以基础网络的特征信息和上一个stage检测的heatmap和pafmap三者综合作为输入。在发布的第二个版本中,热图检测网络依旧分为多个stage,但将双分支结构修改为单分支,前面N个stage只负责学习特征点连接的分布热图pafmap,后面M个stage只负责学习特征点分布热图heatmap,同时将模型中的7*7卷积 替换成3个3*3卷积的残差连接,降低运算量的同时还丰富了模型能学习到的图像感受野。Among them, in addition to detecting the distribution heat map of feature points (also called heatmap), the Openpose algorithm also proposes a heat map representing the connection information of feature points: pafmap. The position with high confidence in the heat map indicates that there is a feature point connection at that location. The probability is also high. The heatmap and pafmap are used as the output of the algorithm model, and the greedy algorithm is used as the post-processing strategy to realize the assignment of multi-person feature points to independent character instances. The method has undergone two versions of evolution. In the first version released, the model structure is divided into a basic network and a heat map detection network. The heat map detection network contains multiple stages, and each stage is divided into two upper and lower stages. Branches, the network structure of each branch is exactly the same, but is responsible for learning different image information: one is responsible for learning the feature point distribution heat map heatmap, and the other is responsible for learning the feature point connection distribution heat map pafmap. The next stage takes the characteristic information of the basic network and the heatmap and pafmap detected by the previous stage as input. In the second version released, the heat map detection network is still divided into multiple stages, but the dual-branch structure is modified to a single branch. The first N stages are only responsible for learning the distributed heat map pafmap connected by feature points, and the latter M stages It is only responsible for learning the heatmap of the feature point distribution heatmap, and at the same time replaces the 7*7 convolution in the model with three 3*3 convolution residual connections, which reduces the amount of calculations and enriches the image perception field that the model can learn. .
然而,发明人在研究中发现,自上而下的人体特征点检测算法虽然无需复杂的后处理过程,但无法同时检测图像中的所有人体特征点,只能先用目标检测算法检测出图像中的人体,再对检测出的单人进行人体特征点检测,检测速度与图像中人体的个数呈线性增长的关系,所有的自上而下的人体特征点算法均存在检测速度慢的缺点,无法做到实时的检测,在移动端部署时,模型运算量过高,参数量较大,部署困难。而Openpose算法虽然检测速度与图像中的人物的个数无关,也不要额外的目标检测算法进行预处理。但该算法模型同样存在模型复杂度高,运算量大的缺点,多个stage的堆叠对模型精度的提升效果不显著,反而带来了大量的冗余运算量,第二个版本中单分支结构中的3*3残差连接方式虽然能增加感受野信息,但带来的精度提升非常小,同时造成了大量的运算量浪费,这些设计结构使得模型在移动端部署时,会造成模型运算量过高,参数量较大,部署困难等问题。However, the inventor found in the research that although the top-down human feature point detection algorithm does not require complicated post-processing, it cannot detect all the human feature points in the image at the same time, and can only use the target detection algorithm to detect the image in the first place. The human body is detected, and then the human body feature point detection is performed on the detected single person. The detection speed has a linear growth relationship with the number of human bodies in the image. All top-down human body feature point algorithms have the disadvantage of slow detection speed. Real-time detection cannot be achieved. When deploying on the mobile terminal, the model calculation is too high, the parameter amount is large, and the deployment is difficult. Although the detection speed of the Openpose algorithm has nothing to do with the number of people in the image, it does not require additional target detection algorithms for preprocessing. However, the algorithm model also has the disadvantages of high model complexity and large amount of calculation. The stacking of multiple stages does not significantly improve the accuracy of the model, but it brings a lot of redundant calculations. The second version has a single branch structure. Although the 3*3 residual connection method in the middle can increase the sensing field information, it brings a very small increase in accuracy, and at the same time causes a lot of waste of calculations. These design structures cause the model to be deployed on the mobile terminal, which will cause the model to calculate Too high, large amount of parameters, difficult deployment and other issues.
针对上述问题,发明人经过长期的研究发现,并提出了本申请实施例提供的人体特征点的检测方法、装置、电子设备以及存储介质,通过对待检测图像进行多尺度特征提取,以获取在不同尺度下的图像特征,并基于不同尺度下的图像特征获取人体特征点位置信息和人体特征点连接信息,从而大幅度提升人体特征点检测的精度和效率。其中,具体的人体特征点的检测方法在后续的实施例中进行详细的说明。In response to the above problems, the inventor has discovered through long-term research and proposed the method, device, electronic equipment and storage medium for detecting human body feature points provided by the embodiments of this application. The multi-scale feature extraction of the image to be detected is used to obtain different information. Based on the image features at different scales, the position information of the human body feature points and the connection information of the human body feature points are obtained based on the image features at different scales, thereby greatly improving the accuracy and efficiency of the detection of human body feature points. Among them, the specific detection method of human body feature points will be described in detail in the subsequent embodiments.
请参阅图1,图1示出了本申请一个实施例提供的人体特征点的检测方法的流程示意图,所述人体特征点的检测方法用于通过对待检测图像进行多尺度特征提取,以获取在不同尺度下的图像特征,并基于不同尺度下的图像特征获取人体特征点位置信息和人体特征点连接信息,从而大幅度提升人体特征点检测的精度和效率。在具体的实施例中,所述人体特征点的检测方法应用于如图6所示的人体特征点的检测装置200以及配置有人体特征点的检测装置200的电子设备100(图7)。下面将以电子设备为例,说明本实施例的具体流程,当然,可以理解的,本实施例所应用的电子设备可以为移动终端、智能手机、平板电脑、穿戴式电子设备等,在此不做限定。下面将针对图1所示的流程进行详细的阐述,所述人体特征点的检测方法具体可以包括以下步骤:Please refer to FIG. 1. FIG. 1 shows a schematic flowchart of a method for detecting human body feature points provided by an embodiment of the present application. Image features at different scales, and based on the image features at different scales, obtain the position information of the human body feature points and the connection information of the human body feature points, thereby greatly improving the accuracy and efficiency of the detection of human body feature points. In a specific embodiment, the method for detecting human body feature points is applied to the human body feature point detection device 200 as shown in FIG. 6 and the electronic device 100 equipped with the human body feature point detection device 200 (FIG. 7 ). The following will take an electronic device as an example to describe the specific process of this embodiment. Of course, it is understandable that the electronic device applied in this embodiment can be a mobile terminal, a smart phone, a tablet computer, a wearable electronic device, etc. Make a limit. The process shown in FIG. 1 will be described in detail below. The method for detecting human feature points may specifically include the following steps:
步骤S110:获取待检测图像。Step S110: Obtain an image to be detected.
在本实施例中,可以获取待检测图像,其中,所获取的待检测图像中包括至少一个人体。在一些实施方式中,该待检测图像可以为通过电子设备的摄像头采集的预览图像、可以为通过电子设备的摄像头拍摄并存储在相册的照片、可以为从网络下载并存储在相册的图像等,在此不做限定。另外,在一些实施方式中,所获取的待检测图像可以为静态图像,也可以为动态图像,在此不做限定。In this embodiment, an image to be detected may be acquired, where the acquired image to be detected includes at least one human body. In some embodiments, the image to be detected may be a preview image collected by a camera of an electronic device, a photo taken by a camera of an electronic device and stored in an album, an image downloaded from the Internet and stored in an album, etc. There is no limitation here. In addition, in some embodiments, the acquired image to be detected may be a static image or a dynamic image, which is not limited herein.
步骤S120:对所述待检测图像进行下采样处理,获得所述待检测图像的第一图像特征。Step S120: Perform down-sampling processing on the image to be detected to obtain a first image feature of the image to be detected.
在本实施例中,在获取待检测图像后,可以对待检测图像进行下采样处理,以获得待检测图像的第一图像特征。其中,可以对待检测图像依次进行2倍下采样处理,直到获得的待检测图像的第一图像特征满足处理需求,在一些实施方式中,可以对待检测图像依次进行2倍下采样处理,共4次,即对待检测图像进行16倍下采样处理,以使所获得的待检测图像的第一图像特征中包括足够的抽象特征,且不会造成特征的过度提取,以满足处理需求。具体地,在获取待检测图像后,可以对待检测图像进行2倍下采样处理,再对进行2倍下采样处理所获取的图像特征进行4倍下采样处理,再对进行4倍下采样处理所获取的图像特征进行8倍下采样处理,再对进行8倍下采样处理所获取的图像特征进行16倍下采样处理,以获得待检测图像的第一图像特征。In this embodiment, after the image to be detected is acquired, the image to be detected may be down-sampled to obtain the first image feature of the image to be detected. Among them, the image to be detected may be sequentially subjected to 2 times down-sampling processing until the obtained first image feature of the image to be detected meets the processing requirements. In some embodiments, the image to be detected may be sequentially subjected to 2 times down-sampling processing, a total of 4 times That is, 16 times down-sampling processing is performed on the image to be detected, so that the first image feature of the image to be detected includes sufficient abstract features without causing excessive feature extraction to meet processing requirements. Specifically, after acquiring the image to be detected, the image to be detected can be down-sampled twice, and then the image features obtained by the down-sampling process can be down-sampled by a factor of 4, and then down-sampling the image feature obtained by the down-sampling process by 4 times. The acquired image features are subjected to 8-fold down-sampling processing, and then the image features acquired through the 8-fold down-sampling processing are subjected to 16-fold down-sampling processing to obtain the first image feature of the image to be detected.
当然,在一些实施方式中,还可以对待检测图像进行更多倍的下采样处理,例如,还可以对待检测图像进行32倍下采样处理、64倍下采样处理,在此不做限定。Of course, in some embodiments, the image to be detected can also be down-sampled by more times. For example, the image to be detected can also be down-sampled by 32 times and 64-fold, which is not limited here.
其中,在本实施例中,所获得的待检测图像的第一图像特征的数量为多个。Wherein, in this embodiment, the number of obtained first image features of the image to be detected is multiple.
步骤S130:对所述第一图像特征进行多尺度特征提取,获得所述待检测图像的多个第二图像特征。Step S130: Perform multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected.
其中,由于对待检测图像进行下采样处理是对待检测图像依次进行2倍下采样处理,具体地,是对待检测图像进行2倍下采样处理后,再对进行2倍下采样处理所获取的图像特征进行4倍下采样处理等,即上述下采样是采用串行的方式进行处理,某个卷积层的输入只能为上一个卷积层的输出,意味着该卷积层所能学习到的图像特征中的特征信息只能是上一卷积层的输出所代表的单一感受野信息,即通过下采样处理所获得的待检测图像的第一图像特征的尺度和感受野较为单一。Among them, because the down-sampling processing of the image to be detected is to sequentially perform 2 times down-sampling processing on the image to be detected, specifically, after the image to be detected is subjected to 2 times down-sampling processing, the image features obtained by the 2 times down-sampling processing are then performed Perform 4 times downsampling processing, etc., that is, the above downsampling is processed in a serial manner. The input of a certain convolutional layer can only be the output of the previous convolutional layer, which means that the convolutional layer can learn The feature information in the image feature can only be the single receptive field information represented by the output of the previous convolutional layer, that is, the scale and receptive field of the first image feature of the image to be detected obtained through down-sampling processing are relatively simple.
因此,在本实施例中,为了提升所获得的图像特征的尺度和感受野,可以对待检测图像的第一图像特征进行多尺度特征提取,以获得待检测图像在不同尺度、不同感受野下的多个第二图像特征。在一些实施方式中,可以通过并列的多个不同卷积核的卷积层对第一图像特征进行处理,具体地,可以将第一图像特征输入多个不同卷积核的卷积层,以使多个不同卷积核的卷积层分别对第一图像特征进行处理,并分别获取第二图像特征,可以理解的,由于并列的多个卷积层使用不同大小的卷积核,使得可以基于同一输入(第一图像特征),能同时输出多个不同尺度、不同感受野的第二图像特征,并一同输出给下一层作为输入,从而可以实现待检测图像的更多尺度和感受野的获取。Therefore, in this embodiment, in order to improve the scale and receptive field of the obtained image features, multi-scale feature extraction can be performed on the first image feature of the image to be detected, so as to obtain the image characteristics of the image to be detected at different scales and different receptive fields. Multiple second image features. In some implementations, the first image feature can be processed through multiple convolutional layers with different convolution kernels in parallel. Specifically, the first image feature can be input to multiple convolutional layers with different convolution kernels to Make multiple convolutional layers with different convolution kernels process the first image feature separately and obtain the second image feature separately. It is understandable that because multiple convolutional layers in parallel use different sizes of convolution kernels, Based on the same input (first image feature), multiple second image features of different scales and different receptive fields can be output at the same time, and output together to the next layer as input, so that more scales and receptive fields of the image to be detected can be realized Of access.
步骤S140:对所述多个第二图像特征进行卷积运算,获得所述待检测图像中的人体特征点位置信息和人体特征点连接信息。Step S140: Perform a convolution operation on the multiple second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
在本实施例中,在获得待检测图像的多个第二图像特征后,可以对多个第二图像特征进行卷积运算,以获得待检测图像的人体特征点位置信息(heatmap)和人体特征点连接信息(pafmap)。在一些实施方式中,在获得待检测图像的多个第二图像特征后,可以将多个第二图像特征分为两条分支进行卷积运算,其中,一条分支对多个第二图像特征进行卷积运算输出人体特征点位置信息,另一条分支对多个第二图像特征进行卷积运算输出人体特征点连接信息。In this embodiment, after obtaining multiple second image features of the image to be detected, convolution operations may be performed on the multiple second image features to obtain the human body feature point position information (heatmap) and human body features of the image to be detected Click the connection information (pafmap). In some embodiments, after obtaining multiple second image features of the image to be detected, the multiple second image features can be divided into two branches for convolution operation, where one branch performs the convolution operation on the multiple second image features. The convolution operation outputs human body feature point position information, and the other branch performs a convolution operation on multiple second image features to output human body feature point connection information.
在一些实施方式中,在获取待检测图像中的人体特征点位置信息和人体特征点连接信息后,可以基于人体特征点位置信息和人体特征点连接信息,获得人体特征点信息。其中,在本实施例中,在获取人体特征点位置信息和人体特征点连接信息后,可以基于人体特征点连接信息对已知位置的人体特征点进行连接,从而绘制生成人体特征点信息。In some embodiments, after acquiring the human body feature point location information and the human body feature point connection information in the image to be detected, the human body feature point information may be obtained based on the human body feature point location information and the human body feature point connection information. Wherein, in this embodiment, after obtaining the human body feature point position information and the human body feature point connection information, the human body feature points at known positions can be connected based on the human body feature point connection information, thereby drawing and generating the human body feature point information.
本申请一个实施例提供的人体特征点的检测方法,获取待检测图像,对待检测图像进行下采样处理,获得待检测图像的第一图像特征,对第一图像特征进行多尺度特征提取,获得待检测图像在不同尺度、不同感受野下的多个第二图像特征,对多个第二图像特征进行卷积运算,获得待检测图像中的人体特征点位置信息和人体特征点连接信息,从而通过对待检测图像进行多尺度特征提取,以获取在不同尺度、不同感受野下的图像特征,并基于不同尺度、不同感受野下的图像特征获取人体特征点位置信息和人体特征点连接信息,从而大幅度提升人体特征点检测的精度和效率。According to an embodiment of the present application, the method for detecting feature points of a human body is to obtain an image to be detected, perform down-sampling processing on the image to be detected, obtain the first image feature of the image to be detected, and perform multi-scale feature extraction on the first image feature to obtain the image to be detected. Detect multiple second image features of the image at different scales and different receptive fields, and perform convolution operations on multiple second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected, so as to pass Perform multi-scale feature extraction on the image to be detected to obtain image features at different scales and different receptive fields, and obtain human feature point position information and human feature point connection information based on image features at different scales and different receptive fields. Increase the accuracy and efficiency of human feature point detection.
请参阅图2,图2示出了本申请又一个实施例提供的人体特征点的检测方法的流程示意图。下面将针对图2所示的流程进行详细的阐述,所述人体特征点的检测方法具体可以包括以下步骤:Please refer to FIG. 2, which shows a schematic flowchart of a method for detecting human body feature points according to another embodiment of the present application. The process shown in FIG. 2 will be described in detail below. The method for detecting human feature points may specifically include the following steps:
步骤S210:获取待检测图像。Step S210: Obtain an image to be detected.
其中,步骤S210的具体描述请参阅步骤S110,在此不再赘述。For the specific description of step S210, please refer to step S110, which will not be repeated here.
步骤S220:对所述待检测图像进行N1倍下采样处理,获得待处理图像特征,其中,N1=2 M1,N1为正整数。 Step S220: Perform N1-fold down-sampling processing on the image to be detected to obtain features of the image to be processed, where N1=2 M1 and N1 is a positive integer.
在本实施例中,在获取待检测图像后,可以对待检测图像进行N1倍下采样处理,以获得待处理图像特征。在一些实施方式中,对待检测图像进行N1倍下采样处理可以为对待检测图像进行16倍下采样处理,即对待检测图像依次进行4次2倍下采样处理,从而实现对待检测图像进行16倍下采样处理,此时,N1=16,M1=4。In this embodiment, after the image to be detected is acquired, the image to be detected may be down-sampled by N1 times to obtain the features of the image to be processed. In some embodiments, the N1 times downsampling process of the image to be detected may be 16 times downsampling of the image to be detected, that is, the image to be detected is subjected to 4 times of 2 times downsampling in sequence, so as to realize the 16 times downsampling of the image to be detected. Sampling processing, at this time, N1=16, M1=4.
步骤S230:对所述待处理图像特征进行N2倍上采样处理,获得所述待检测图像的第一图像特征,N2=2 M2,N2<N1,N2为正整数。 Step S230: Perform N2 times upsampling processing on the image feature to be processed to obtain the first image feature of the image to be detected, N2=2 M2 , N2<N1, and N2 is a positive integer.
其中,鉴于对待检测图像进行N1倍下采样处理时,为了获取待检测图像更多的抽象特征,则对待检测图像进行N1倍下采样处理后的图像尺寸往往会比较小,如果直接将N1倍下采样处理后的待处理图像特征进行多尺度特征提取,那么,在进行卷积核较大的卷积的时候容易造成图像特征的过度提取,引入过多不必要的冗余信息。例如,为了获取待检测图像更多的抽象特征,一般会对待检测图像进行16倍下采样处理,相应地,对待检测图像进行16倍下采样处理后的图像尺寸会比较小,如果直接将16倍下采样处理后的第一待处理特征图像进行多尺度特征提取,那么,在进行7*7卷积的时候便会容易造成图像特征的过度提取,引入不必要的冗余信息。Among them, in view of the N1 times downsampling of the image to be detected, in order to obtain more abstract features of the image to be detected, the image size after the N1 times downsampling of the image to be detected is often smaller. If you directly downsample the N1 times Multi-scale feature extraction is performed on the image features to be processed after sampling processing. Then, when convolution with a large convolution kernel is performed, it is easy to cause excessive extraction of image features and introduce too much unnecessary redundant information. For example, in order to obtain more abstract features of the image to be detected, the image to be detected is generally downsampled by 16 times. Correspondingly, the image size after the image to be detected is downsampled by 16 times will be smaller. The first feature image to be processed after the down-sampling process is subjected to multi-scale feature extraction. Then, when performing 7*7 convolution, it will easily cause excessive extraction of image features and introduce unnecessary redundant information.
因此,在本实施例中,在对待检测图像进行N1倍下采样处理获得待处理图像特征后,还可以对待处理图像特征进行N2倍上采样处理,以将从新获取的图像特征确定为待检测图像的第一图像特征,以避免造成图像特征的过度提取,引入不必要的冗余信息。在一些实施方式中,对待检测图像进行N1倍下采样处理可以为对待检测图像进行16倍下采样处理,对待处理图像特征进行N2倍上采样处理可以为2倍上采样处理,此时,N1=16,M1=4,N2=2,M2=1,也就是说,在对待处理图像特征进行2倍上采样处理后,可以使第一图像特征恢复8倍下采样下的图像特征,从而在保证获取较多抽象特征的前提下,避免造成图像特征的过度提取,引入不必要的冗余信息。Therefore, in this embodiment, after N1 times downsampling processing is performed on the image to be detected to obtain the image features to be processed, the image features to be processed can also be subjected to N2 times upsampling processing to determine the newly acquired image features as the image to be detected. The first image feature to avoid excessive extraction of image features and the introduction of unnecessary redundant information. In some embodiments, performing N1 times downsampling processing on the image to be detected may be 16 times downsampling processing for the image to be detected, and performing N2 times upsampling processing on the image features to be processed may be 2 times upsampling processing. In this case, N1= 16, M1=4, N2=2, M2=1, that is to say, after the image feature to be processed is subjected to 2 times upsampling, the first image feature can be restored to the image feature under 8 times downsampling, thereby ensuring Under the premise of acquiring more abstract features, avoid excessive extraction of image features and introduce unnecessary redundant information.
步骤S240:对所述第一图像特征进行多尺度特征提取,获得所述待检测图像的多个第二图像特征。Step S240: Perform multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected.
其中,步骤S240的具体描述请参阅步骤S130,在此不再赘述。For the specific description of step S240, please refer to step S130, which will not be repeated here.
步骤S250:对所述待检测图像进行下采样处理,获取所述待检测图像的第三图像特征。Step S250: Perform down-sampling processing on the image to be detected, and obtain a third image feature of the image to be detected.
其中,为了进一步提升所能获取的特征尺度信息和感受野,除了将多个第二图像特征进行卷积运算,获得人体特征点位置信息和人体特征点连接信息外,还可以额外从待检测图像中进行下采样处理参与卷积运算,这种方式不仅能进一步地增加特征尺度信息和感受野,还能增加浅层精确像素位置信息,提升人体特征点位置信息和人体特征点连接信息的获取精度。具体地,对待检测图像的第一图像特征进行多尺度特征提取所获得的多个第二图像特征为待检测图像的抽象特征,对待检测图像进行下采样处理所获得的第三图像特征为待检测图像的浅层特征,即多个第二图像特征和第三图像特征的尺度不同、感受野不同,因此,在将第三图像特征参与卷积运算,获取人体特征点位置信息和人体特征点连接信息时,可以增加数据的尺度和感受野,再者,由于第三图像特征为浅层图像特征,且浅层图像特征的像素位置信息更精确,从而可以提升所获取的人体特征点位置信息和人体特征点连接信息的精度。Among them, in order to further improve the feature scale information and receptive field that can be obtained, in addition to performing convolution operations on multiple second image features to obtain body feature point position information and body feature point connection information, you can also obtain additional information from the image to be detected The down-sampling process participates in the convolution operation in this method. This method can not only further increase the feature scale information and receptive field, but also increase the shallow accurate pixel position information, and improve the acquisition accuracy of the human body feature point position information and the body feature point connection information . Specifically, the multiple second image features obtained by performing multi-scale feature extraction on the first image feature of the image to be detected are abstract features of the image to be detected, and the third image feature obtained by down-sampling the image to be detected is the image to be detected. The shallow features of the image, that is, multiple second image features and third image features have different scales and different receptive fields. Therefore, the third image feature is involved in the convolution operation to obtain the position information of the human body feature points and the connection of the human body feature points In information, the scale and receptive field of the data can be increased. Furthermore, since the third image feature is a shallow image feature, and the pixel position information of the shallow image feature is more accurate, it can improve the acquired human feature point position information and The accuracy of the connection information of the feature points of the human body.
因此,在本实施例中,还可以对待检测图像进行下采样,以获得待检测图像的第三图像特征,并将第三图像特征参与卷积运算。在一些实施方式中,可以通过卷积层对待检测图像进行特征提取,在此不做限定。Therefore, in this embodiment, the image to be detected can also be down-sampled to obtain the third image feature of the image to be detected, and the third image feature is involved in the convolution operation. In some embodiments, the feature extraction of the image to be detected may be performed through a convolutional layer, which is not limited herein.
其中,在进行两个图像特征的通道连接时,需要保证两个图像特征对应的图像尺寸一致。因此,在本实施例中,若要进行第一图像特征和第三图像特征的通道连接,需要保证第一图像特征对应的图像尺寸和第三图像特征对应的图像尺寸一尺。例如,若第一图像特征是在待检测图像进行16倍下采样获得,则第三图像特征也需要是在进行16倍下采样获得,若第一图像特征是在待检测图像进行8倍下采样获得,则第三图像特征也需要是在进行8倍下采样获得。Among them, when performing channel connection of two image features, it is necessary to ensure that the image sizes corresponding to the two image features are consistent. Therefore, in this embodiment, if the channel connection between the first image feature and the third image feature is to be performed, it is necessary to ensure that the image size corresponding to the first image feature and the image size corresponding to the third image feature are one foot. For example, if the first image feature is obtained by downsampling 16 times on the image to be detected, the third image feature also needs to be obtained by downsampling 16 times. If the first image feature is obtained by down sampling 8 times on the image to be detected If it is obtained, the third image feature also needs to be obtained through 8-fold down-sampling.
因此,在本实施例中,在获取待检测图像后,可以对待检测图像进行N3倍下采样处理,以获得待检测图像的第三图像特征。其中,对待检测图像进行N3倍下采样处理可以为对待检测图像进行2 M1-M2倍下采样处理,以使对待检测图像进行N3倍下采样处理所获得的第三图像特征对应的图像尺寸与第一图像特征对应的图像尺寸一致,从而实现后续将多个第二待处理图像特征和第三图像特征进行通道连接时提供连接基础。 Therefore, in this embodiment, after the image to be detected is acquired, the image to be detected may be down-sampled by N3 times to obtain the third image feature of the image to be detected. Wherein, the N3 times downsampling process of the image to be detected can be the 2 M1-M2 times downsampling process for the image to be detected, so that the image size corresponding to the third image feature obtained by the N3 times downsampling process of the image to be detected can be compared with the first The image size corresponding to one image feature is consistent, so as to provide a connection basis when a plurality of second to-be-processed image features and third image features are subsequently channel-connected.
在一些实施方式中,该第一图像特征可以是通过对待检测图像进行N1倍下采样处理后再进行N2倍上采样处理获得,其中,N1=2 M1,N2=2 M2,第三图像特征可以是对待检测图像进行N3倍下采样处理后进行特征提取获得,其中N3=2 M1-M2,即可以保证第一图像特征对应的图像尺寸与第三图像特征对应的图像尺寸一致。例如,当N1=16,N2=2时,则第一图像特征对应的图像尺寸是待检测图像经过8倍下采样处理对应的图像尺寸,此时,可以确定M1=4,M2=1,由于N3=2 M1-M2,则可以确定N3=8,也就是说,第三图像特征对应的图像尺寸也是待检测图像经过8倍下采样处理对应的图像尺寸。 In some embodiments, the first image feature may be obtained by performing N1 times downsampling processing on the image to be detected and then performing N2 times upsampling processing, where N1=2 M1 , N2=2 M2 , and the third image feature may be The image to be detected is obtained by performing feature extraction after N3 times downsampling, where N3=2 M1-M2 , that is, it can be ensured that the image size corresponding to the first image feature is consistent with the image size corresponding to the third image feature. For example, when N1=16 and N2=2, the image size corresponding to the first image feature is the image size corresponding to the 8 times downsampling process of the image to be detected. At this time, it can be determined that M1=4, M2=1, because N3=2 M1-M2 , it can be determined that N3=8, that is, the image size corresponding to the third image feature is also the image size corresponding to the image to be detected after 8 times downsampling processing.
步骤S260:对所述多个第二图像特征和所述第三图像特征进行卷积运算,获得所述待检测图像中的所述人体特征点位置信息和所述人体特征点连接信息。Step S260: Perform a convolution operation on the plurality of second image features and the third image feature to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
在一些实施方式中,在获得待检测图像的多个第二图像特征和第三图像特征后,可以对多个第二图像特征和第三图像特征进行卷积运算,以获得待检测图像的人体特征点位置信息和人体特征点连接信息。在一些实施方式中,在获得待检测图像的多个第二图像特征和第三图像特征后,可以将多个第二图像特征和第三图像特征分为两条分支进行卷积运算,其中,一条分支对多个第二图像特征和第三图像特征进行卷积运算输出人体特征点位置信息,另一条分支对多个第二图像特征和第三图像特征进行卷积运算输出人体特征点连接信息。In some embodiments, after obtaining multiple second image features and third image features of the image to be detected, a convolution operation may be performed on the multiple second image features and third image features to obtain the human body of the image to be detected Feature point location information and body feature point connection information. In some embodiments, after obtaining multiple second image features and third image features of the image to be detected, the multiple second image features and third image features may be divided into two branches for convolution operation, where: One branch performs convolution operations on multiple second image features and third image features to output human feature point position information, and the other branch performs convolution operations on multiple second image features and third image features to output human feature point connection information .
请参阅图3,图3示出了本申请的图2所示的人体特征点的检测方法的步骤S260的流程示意图。下面将针对图3所示的流程进行详细的阐述,所述方法具体可以包括以下步骤:Please refer to FIG. 3, which shows a schematic flowchart of step S260 of the method for detecting human body feature points shown in FIG. 2 of the present application. The following will elaborate on the process shown in FIG. 3, and the method may specifically include the following steps:
步骤S261:将所述多个第二图像特征和所述第三图像特征进行通道连接,获得第四图像特征。Step S261: Channel connecting the plurality of second image features and the third image feature to obtain a fourth image feature.
在本实施例中,可以将多个第二图像特征和第三图像特征进行通道连接,以获得第四图像特征,并将第四图像特征参与卷积运算,以获得待检测图像总的人体特征点位置信息和人体特征点连接信息。在一些实施方式中,在获得多个第二图像特征和第三图像特征后,可以通过concat算子对多个第二图像特征和第三图像特征进行通道连接,例如,若多个第二图像特征包括两个第二图像特征,分别为19维和38维,第三图像特征为38维,则经过通道concat之后,输出的第四图像特征为19+38+38=95维。In this embodiment, multiple second image features and third image features can be channel-connected to obtain the fourth image feature, and the fourth image feature is involved in the convolution operation to obtain the total human body feature of the image to be detected Point location information and body feature point connection information. In some embodiments, after multiple second image features and third image features are obtained, the multiple second image features and third image features can be channel-connected through the concat operator. For example, if multiple second image features are obtained The features include two second image features, respectively 19-dimensional and 38-dimensional, and the third image feature is 38-dimensional. After the channel concat, the output fourth image feature is 19+38+38=95-dimensional.
步骤S262:对所述第四图像特征进行卷积运算,获得所述待检测图像中的所述人体特征点位置信息和所述人体特征点连接信息。Step S262: Perform a convolution operation on the fourth image feature to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
在本实施例中,在获得待检测图像的第四图像特征后,可以对第四图像特征进行卷积运算,以获得待检测图像的人体特征点位置信息和人体特征点连接信息。在一些实施方式中,在获得待检测图像的第四图像特征后,可以将第四图像特征分为两条分支进行卷积运算,其中,一条分支对第四图像特征进行卷积运算输出人体特征点位置信息,另一条分支对第四图像特征进行卷积运算输出人体特征点连接信息。In this embodiment, after the fourth image feature of the image to be detected is obtained, a convolution operation may be performed on the fourth image feature to obtain the position information of the human body feature point and the connection information of the human body feature point of the image to be detected. In some embodiments, after the fourth image feature of the image to be detected is obtained, the fourth image feature can be divided into two branches for convolution operation, and one branch performs convolution operation on the fourth image feature to output the human body feature Point position information, another branch performs a convolution operation on the fourth image feature to output the human body feature point connection information.
本申请又一个实施例提供的人体特征点的检测方法,获取待检测图像,对待检测图像进行N1倍下采样处理,获得待处理图像特征,对待处理图像特征进行N2倍上采样处理,获得待检测图像的第一图像特征,对第一图像特征进行多尺度特征提取,获得待检测图像再不同尺度、不同感受野下的多个第二图像特征,对待检测图像进行特征提取,获得待检测图像的第三图像特征,对多个第二图像特征和第三图像特征进行卷积运算,获得待检测图像中的人体特征点位置信息和人体特征点连接信息。相较于图1所示的人体特征点的检测方法,本实施例还对待检测图像进行N1倍下采样处理后,再进行N2倍上采样处理,以获得第一图像特征,以获得更多抽象信息的同时,能避免图像特征的过度提取引入过多不必要的冗余信息。另外,本实施例还基于多个第二图像特征和基于待检测图像提取到的第三图像特征进行卷积运算,以增加图像的感受野。According to another embodiment of the present application, a method for detecting feature points of a human body is provided, an image to be detected is obtained, the image to be detected is subjected to N1 times down-sampling processing to obtain the image features to be processed, and the image to be processed is subjected to N2 times upsampling processing to obtain the to-be-detected image features The first image feature of the image, multi-scale feature extraction is performed on the first image feature, and multiple second image features of the image to be detected under different scales and different receptive fields are obtained. Feature extraction is performed on the image to be detected to obtain the image of the image to be detected. In the third image feature, a convolution operation is performed on a plurality of second image features and a third image feature to obtain the human body feature point position information and the human body feature point connection information in the image to be detected. Compared with the detection method of human body feature points shown in FIG. 1, this embodiment also performs N1 times downsampling processing on the image to be detected, and then performs N2 times upsampling processing to obtain the first image features to obtain more abstractions. At the same time, it can avoid the excessive extraction of image features and the introduction of unnecessary redundant information. In addition, this embodiment also performs a convolution operation based on a plurality of second image features and a third image feature extracted based on the image to be detected, so as to increase the receptive field of the image.
请参阅图4,图4示出了本申请再一个实施例提供的人体特征点的检测方法的流程示意图。下面将针对图4所示的流程进行详细的阐述,所述人体特征点的检测方法具体可以包括以下步骤:Please refer to FIG. 4, which shows a schematic flowchart of a method for detecting human body feature points according to another embodiment of the present application. The following will elaborate on the process shown in FIG. 4, and the method for detecting human feature points may specifically include the following steps:
步骤S310:获取待检测图像。Step S310: Obtain an image to be detected.
其中,步骤S310的具体描述请参阅步骤S110,在此不再赘述。For the specific description of step S310, please refer to step S110, which will not be repeated here.
步骤S320:对所述待检测图像进行下采样处理,获得所述待检测图像的第一图像特征。Step S320: Perform down-sampling processing on the image to be detected to obtain a first image feature of the image to be detected.
在本实施例中,可以采用已训练的检测模型对获取的待检测图像进行处理,以输出该待检测图像的人体特征点位置信息和人体特征点连接信息。其中,如图5所示,图5示出了本申请实施例提供的检测模型的整体框架图,该检测模型可以包含3个主体部分:基础网络模块F、多尺度模块M以及热图检测模块S。In this embodiment, a trained detection model may be used to process the acquired image to be detected, so as to output the human body feature point position information and the human body feature point connection information of the to be detected image. Among them, as shown in FIG. 5, FIG. 5 shows the overall framework diagram of the detection model provided by the embodiment of the present application. The detection model may include three main parts: a basic network module F, a multi-scale module M, and a heat map detection module. S.
其中,在获取待检测图像后,可以将待检测图像输入检测模型中的基础网络模块,通过基础网络模块对待检测图像进行下采样处理,以获得该待检测图像的第一图像特征,并将该第一图像特征作为检测模型中的多尺度模块的输入。在一些实施方式中,该基础网络模块可以包括:Vgg、ResNet、Mobilenet等卷积神经网络,若使用较深层次的网络模型如Vgg、ResNet,则会增加模型的运算量,但能得到更高的检测精度,若使用轻量化的网络模型如Mobilenet,则会损失一定的检测精度,但能获取更快的检测速度,可以做到完全的实时检测。Among them, after acquiring the image to be detected, the image to be detected can be input to the basic network module in the detection model, and the image to be detected is down-sampled through the basic network module to obtain the first image feature of the image to be detected, and the The first image feature is used as the input of the multi-scale module in the detection model. In some embodiments, the basic network module may include: Vgg, ResNet, Mobilenet, and other convolutional neural networks. If a deeper network model such as Vgg, ResNet is used, the computational complexity of the model will be increased, but higher If a lightweight network model such as Mobilenet is used, a certain detection accuracy will be lost, but a faster detection speed can be obtained, and complete real-time detection can be achieved.
步骤S330:将所述第一图像特征输入检测模型的多尺度模块,通过所述多尺度模块对所述第一图像特征进行多尺度特征提取,获得所述待检测图像的多个第二图像特征。Step S330: Input the first image feature into the multi-scale module of the detection model, and perform multi-scale feature extraction on the first image feature through the multi-scale module to obtain multiple second image features of the image to be detected .
其中,在本实施例中,在获得基础网络模块输出的第一图像特征后,可以将第一图像特征输入检测模型的多尺度模块,以通过多尺度模块对第一图像特征进行多尺度特征提取,获得待检测图像的多个第二图像特征。其中,在一些实施方式中,多尺度模块包括并列的多个卷积层,多个卷积层中的每个卷积层的卷积核均不同,每个卷积层用于从第一图像特征中提取不同尺度、不同感受野的第二图像特征。作为一种方式,该多尺度模块可以包括4个并列的卷积层,依次为:1*1卷积、3*3卷积、5*5卷积以及7*7卷积,每个卷积层的卷积核大小依次递增,负责提取不同尺度和感受野的图像信息,该4个并列的卷积层共同组成了该多尺度模块。Wherein, in this embodiment, after obtaining the first image feature output by the basic network module, the first image feature can be input to the multi-scale module of the detection model to perform multi-scale feature extraction on the first image feature through the multi-scale module , To obtain multiple second image features of the image to be detected. Among them, in some embodiments, the multi-scale module includes multiple convolutional layers in parallel, and the convolution kernel of each convolutional layer in the multiple convolutional layers is different, and each convolutional layer is used to obtain data from the first image. Extract the second image features of different scales and different receptive fields from the features. As a way, the multi-scale module can include 4 parallel convolutional layers, in order: 1*1 convolution, 3*3 convolution, 5*5 convolution, and 7*7 convolution, each convolution The size of the convolution kernel of the layers increases sequentially, and is responsible for extracting image information of different scales and receptive fields. The four parallel convolutional layers together form the multi-scale module.
步骤S340:将所述多个第二图像特征输入检测模型的热图检测模块,通过所述热图检测模块对所述多个第二图像特征进行卷积运算,获得所述热图检测模块输出的所述人体特征点位置信息和所述人体特征点连接信息。Step S340: Input the plurality of second image features into the heat map detection module of the detection model, and perform a convolution operation on the plurality of second image features through the heat map detection module to obtain the output of the heat map detection module The human body feature point location information and the human body feature point connection information.
其中,在本实施例中,在获得多尺度模块输出的多个第二图像特征后,可以将多个第二图像特征输入检测模型的热图检测模块,以通过热图检测模块对多个第二图像特征进行卷积运算,获得人体特征点位置信息和人体特征点连接信息。在一些实施方式中,还可以获得基础网络模块输出的第三图像特征,则可以将多个第二图像特征和第三图像特征进行通道连接获得第四图像特征后,输入入检测模型的热图检测模块,以通过热图检测模块对第四图像特征进行卷积运算,获得人体特征点位置信息和人体特征点连接信息。Wherein, in this embodiment, after obtaining multiple second image features output by the multi-scale module, multiple second image features can be input to the heat map detection module of the detection model, so that the multiple second image features can be detected by the heat map detection module. Two image features are subjected to convolution operation to obtain the position information of the human body feature points and the connection information of the human body feature points. In some embodiments, the third image feature output by the basic network module can also be obtained, and then multiple second image features and third image features can be channel-connected to obtain the fourth image feature, and then input into the heat map of the detection model The detection module is configured to perform a convolution operation on the fourth image feature through the heat map detection module to obtain the human body feature point position information and the human body feature point connection information.
在一些实施方式中,热图检测模块仅包括一个卷积阶段(stage),该一个卷积阶段包括第一处理分支和第二处理分支,第一处理分支用于检测并输出人体特征点位置信息,第二处理分支用于检测并输出人体特征点连接信息。另外,第一处理分支包括两个卷积层,第二处理分支包括两个卷积层。In some embodiments, the heat map detection module includes only one convolution stage. The one convolution stage includes a first processing branch and a second processing branch. The first processing branch is used to detect and output human body feature point position information. , The second processing branch is used to detect and output the connection information of the human body feature points. In addition, the first processing branch includes two convolutional layers, and the second processing branch includes two convolutional layers.
其中,在Openpose模型中,热图检测模块均由多个stage进行串行连接来提升精度,但实验表明,无论是heatmap的检测还是pafmap的检测,均不需要过多的stage来进行校正,多个stage的串联,不仅带来的精度提升十分有限,还会带来巨大的参数量和运算量。而本实施例,加入了多尺度模块,使得输入到热图检测模块中的图像特征信息已经包含有非常丰富的图像特征信息和尺度信息,进一步使得热图检测模块缩减stage数目成为可能,只采用一个stage足够取得较高的精度,还能大幅减小模型的运算量和参数量,使得模型在移动端做到实时检测。另外,在本实施例中,热图检测模块只包含一个stage,为进一步缩减参数量和运算量,stage的每条分支中只采用2个卷积层:一个3*3卷积负责对输入的通道连接图像特征进行进一步的特征提取,另一个1*1卷积负责对人体特征点位置信息/人体特征点连接信息进行检测,输出对应通道数的特征图。Among them, in the Openpose model, the heat map detection module is serially connected by multiple stages to improve accuracy, but experiments have shown that neither heatmap detection nor pafmap detection does not require too many stages for correction. The concatenation of the stages not only brings a very limited increase in accuracy, but also brings a huge amount of parameters and calculations. In this embodiment, a multi-scale module is added, so that the image feature information input to the heat map detection module already contains very rich image feature information and scale information, which further makes it possible for the heat map detection module to reduce the number of stages, and only use A stage is enough to achieve high accuracy, and it can also greatly reduce the amount of calculations and parameters of the model, so that the model can be detected in real time on the mobile terminal. In addition, in this embodiment, the heat map detection module contains only one stage. In order to further reduce the amount of parameters and calculations, only two convolutional layers are used in each branch of the stage: a 3*3 convolution is responsible for the input The channel connection image feature is used for further feature extraction, and another 1*1 convolution is responsible for detecting the position information of the human body feature point/the connection information of the human body feature point, and output the feature map corresponding to the number of channels.
针对前述实施例中的已训练的检测模型,本申请实施例中还可以包括对该检测模型 的训练和校正,其中,对检测模型的训练可以是根据获取的训练数据集预先进行的,后续在每次进行检测时,则可以根据该检测模型进行检测,而无需每次进行检测时对检测模型进行训练。Regarding the trained detection model in the foregoing embodiment, the embodiment of the present application may further include training and correction of the detection model, wherein the training of the detection model may be performed in advance according to the acquired training data set, and subsequently Each time the detection is performed, the detection can be performed according to the detection model, and there is no need to train the detection model each time the detection is performed.
在一些实施方式中,对检测模型进行训练包括:获取训练数据集,训练数据集包括多个图像,以及多个图像中的每个图像对应的人体特征点位置信息和人体特征点连接信息,基于训练数据集,将每个图像作为输入数据,以及每个图像对应的人体特征点位置信息和人体特征点连接信息作为输出数据,通过机器学习算法进行训练,获得已训练的检测模型。其中,机器学习算法可以包括包括上述基础网络模块F、多尺度模块M以及热图检测模块S对应的算法。In some embodiments, training the detection model includes: obtaining a training data set, the training data set includes multiple images, and the human body feature point position information and the human body feature point connection information corresponding to each of the multiple images, based on In the training data set, each image is used as input data, and the position information of the human body feature points and the connection information of the human body feature points corresponding to each image are used as output data. The machine learning algorithm is used for training to obtain a trained detection model. Among them, the machine learning algorithm may include algorithms corresponding to the above-mentioned basic network module F, multi-scale module M, and heat map detection module S.
其中,在检测模型的训练过程中,可以设置一个目标函数,该函数用于衡量检测模型检测的结果与真实标签之间的差距,该函数就叫损失函数,又称为loss函数。检测模型训练的目标就是最小化这个函数。对检测模型设置不同的loss函数,则意味着对检测模型的训练设置不同的学习目标。Among them, in the training process of the detection model, an objective function can be set, which is used to measure the difference between the detection result of the detection model and the real label. This function is called a loss function, also called a loss function. The goal of detection model training is to minimize this function. Setting different loss functions for the detection model means setting different learning goals for the training of the detection model.
在本实施例中,损失函数共包含两部分:L total=L heatmap+L pafmap,其中,L heatmap表示特征点位置热图损失,L pafmap表示特征点连接热图损失。 In the present embodiment, the loss function contains two parts: L total = L heatmap + L pafmap, wherein, L heatmap represents the feature point position in FIG heat loss, L pafmap represents a feature point of attachment heat loss FIG.
其中,特征点位置热图损失用来衡量检测的特征点位置热图与真实的特征点位置热图之间的损失:Among them, the feature point location heat map loss is used to measure the loss between the detected feature point location heat map and the real feature point location heat map:
Figure PCTCN2021073863-appb-000001
Figure PCTCN2021073863-appb-000001
其中(i,j)表示特征图中像素点位置,P heat(i,j)表示检测的特征点特征图中位置(i,j)上的值,G heat(i,j)表示真实的特征点特征图中位置(i,j)上的值,width和height分别表示特征点特征图的宽和高。 Where (i,j) represents the position of the pixel in the feature map, P heat (i,j) represents the value of the detected feature point at the location (i,j) in the feature map, and G heat (i,j) represents the real feature The value at position (i, j) in the point feature map, width and height respectively represent the width and height of the feature point map.
特征点连接热图损失用来衡量检测的特征点连接热图与真实的特征点连接热图之间的损失:Feature point connection heat map loss is used to measure the loss between the detected feature point connection heat map and the real feature point connection heat map:
Figure PCTCN2021073863-appb-000002
Figure PCTCN2021073863-appb-000002
其中(i,j)表示特征图中像素点位置,P paf(i,j)表示检测的特征点连接特征图中位置(i,j)上的值,G paf(i,j)表示真实的特征点连接特征图中位置(i,j)上的值,width和height分别表示特征点连接特征图的宽和高。 Where (i,j) represents the position of the pixel in the feature map, P paf (i,j) represents the detected feature point connecting the value at the location (i,j) in the feature map, and G paf (i,j) represents the real The value at the position (i, j) of the feature point connection feature map, width and height respectively represent the width and height of the feature point connection feature map.
本申请再一个实施例提供的人体特征点的检测方法,获取待检测图像,对待检测图像进行下采样处理,获得待检测图像的第一图像特征,将第一图像特征输入检测模型的多尺度模块,通过多尺度模块对第一图像特征进行多尺度特征提取,获得待检测图像的多个第二图像特征,将多个第二图像特征输入检测模型的热图检测模块,通过热图检测模块对多个第二图像特征进行卷积运算,获得待检测图像中的人体特征点位置信息和人体特征点连接信息。相较于图1所示的人体特征点的检测方法,本实施例还通过检测模型对待检测图像的人体特征点进行检测,以提高人体特征点检测的准确性。In another embodiment of the present application, a method for detecting feature points of a human body is provided, an image to be detected is obtained, the image to be detected is down-sampled, the first image feature of the image to be detected is obtained, and the first image feature is input to the multi-scale module of the detection model , Perform multi-scale feature extraction on the first image feature through the multi-scale module to obtain multiple second image features of the image to be detected, input multiple second image features into the heat map detection module of the detection model, and use the heat map detection module to A convolution operation is performed on a plurality of second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected. Compared with the method for detecting human body feature points shown in FIG. 1, this embodiment also detects the human body feature points of the image to be detected through the detection model, so as to improve the accuracy of detecting the human body feature points.
请参阅图6,图6示出了本申请实施例提供的人体特征点的检测装置200的模块框图。下面将针对图6所示的框图进行阐述,所述人体特征点的检测装置200包括:待检测图像获取模块210、第一图像特征获取模块220、第二图像特征获取模块230以及人体特征点检测模块240,其中:Please refer to FIG. 6. FIG. 6 shows a block diagram of a human body feature point detection apparatus 200 provided by an embodiment of the present application. The following will elaborate on the block diagram shown in FIG. 6, the human body feature point detection device 200 includes: a to-be-detected image acquisition module 210, a first image feature acquisition module 220, a second image feature acquisition module 230, and human body feature point detection Module 240, where:
待检测图像获取模块210,用于获取待检测图像。The to-be-detected image acquisition module 210 is used to acquire the to-be-detected image.
第一图像特征获取模块220,用于对所述待检测图像进行下采样处理,获得所述待检测图像的第一图像特征。The first image feature acquisition module 220 is configured to perform down-sampling processing on the image to be detected to obtain the first image feature of the image to be detected.
进一步地,所述第一图像特征获取模块220包括:待处理图像特征获得子模块和第一图像特征获取子模块,其中:Further, the first image feature acquisition module 220 includes: a to-be-processed image feature acquisition sub-module and a first image feature acquisition sub-module, wherein:
待处理图像特征获得子模块,用于对所述待检测图像进行N1倍下采样处理,获得待处理图像特征,其中,N1=2 M1,N1为正整数。 The image feature obtaining submodule to be processed is used to perform N1 times downsampling processing on the image to be detected to obtain the image feature to be processed, where N1=2 M1 and N1 is a positive integer.
第一图像特征获取子模块,用于对所述待处理图像特征进行N2倍上采样处理,获得所述待检测图像的第一图像特征,N2=2 M2,N2<N1,N2为正整数。 The first image feature acquisition sub-module is configured to perform N2 times upsampling processing on the image feature to be processed to obtain the first image feature of the image to be detected, N2=2 M2 , N2<N1, and N2 is a positive integer.
第二图像特征获取模块230,用于对所述第一图像特征进行多尺度特征提取,获得所述待检测图像的多个第二图像特征。The second image feature acquisition module 230 is configured to perform multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected.
进一步地,所述第二图像特征获取模块230包括:第二图像特征获取子模块,其中:Further, the second image feature acquisition module 230 includes: a second image feature acquisition sub-module, wherein:
第二图像特征获取子模块,用于将所述第一图像特征输入检测模型的多尺度模块,通过所述多尺度模块对所述第一图像特征进行多尺度特征提取,获得所述待检测图像的多个第二图像特征。The second image feature acquisition sub-module is used to input the first image feature into the multi-scale module of the detection model, and perform multi-scale feature extraction on the first image feature through the multi-scale module to obtain the image to be detected Of multiple second image features.
人体特征点检测模块240,用于对所述多个第二图像特征进行卷积运算,获得所述待检测图像中的人体特征点位置信息和人体特征点连接信息。The human body feature point detection module 240 is configured to perform a convolution operation on the multiple second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
进一步地,所述人体特征点检测模块240包括:第三特征图像获取子模块和第一人体特征点检测子模块,其中:Further, the human body feature point detection module 240 includes: a third feature image acquisition sub-module and a first human body feature point detection sub-module, wherein:
第三特征图像获取子模块,用于对所述待检测图像进行特征提取,获取所述待检测图像的第三图像特征。The third feature image acquisition sub-module is configured to perform feature extraction on the image to be detected, and obtain the third image feature of the image to be detected.
进一步地,所述第三特征图像获取子模块包括:第三特征图像获取单元,其中:Further, the third characteristic image acquisition sub-module includes: a third characteristic image acquisition unit, wherein:
第三特征图像获取单元,用于对所述待检测图像进行N3倍下采样处理,所述待检测图像的第三图像特征,其中,N3=2 M1-M2,N3为正整数。 The third feature image acquisition unit is configured to perform N3 times downsampling processing on the image to be detected, and the third image feature of the image to be detected, where N3=2 M1-M2 , and N3 is a positive integer.
第一人体特征点检测子模块,用于对所述多个第二图像特征和所述第三图像特征进行卷积运算,获得所述待检测图像中的所述人体特征点位置信息和所述人体特征点连接信息。The first human body feature point detection sub-module is used to perform convolution operations on the plurality of second image features and the third image feature to obtain the human body feature point position information in the image to be detected and the Human body feature point connection information.
进一步地,所述第一人体特征点检测子模块包括:第四图像特征获得单元和人体特征点检测单元,其中:Further, the first human body feature point detection sub-module includes: a fourth image feature obtaining unit and a human body feature point detection unit, wherein:
第四图像特征获得单元,用于将所述多个第二图像特征和所述第三图像特征进行通道连接,获得第四图像特征。The fourth image feature obtaining unit is configured to channel-connect the plurality of second image features and the third image feature to obtain a fourth image feature.
人体特征点检测单元,用于对所述第四图像特征进行卷积运算,获得所述待检测图像中的所述人体特征点位置信息和所述人体特征点连接信息。The human body feature point detection unit is configured to perform a convolution operation on the fourth image feature to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
进一步地,所述人体特征点检测模块240包括:第二人体特征点检测子模块,其中:Further, the human body feature point detection module 240 includes: a second human body feature point detection sub-module, wherein:
第二人体特征点检测子模块,用于将所述多个第二图像特征输入检测模型的热图检测模块,通过所述热图检测模块对所述多个第二图像特征进行卷积运算,获得所述热图检测模块输出的所述人体特征点位置信息和所述人体特征点连接信息。The second human body feature point detection sub-module is used to input the multiple second image features into the heat map detection module of the detection model, and perform convolution operations on the multiple second image features through the heat map detection module, Obtain the human body feature point location information and the human body feature point connection information output by the heat map detection module.
进一步地,所述人体特征点的检测装置200还包括:训练数据集获取模块和模型训练模块,其中:Further, the device 200 for detecting human body feature points further includes: a training data set acquisition module and a model training module, wherein:
训练数据集获取模块,用于获取训练数据集,所述训练数据集包括多个图像,以及所述多个图像中的每个图像对应的人体特征点位置信息和人体特征点连接信息。The training data set acquisition module is used to acquire a training data set, the training data set includes a plurality of images, and the human body feature point position information and the human body feature point connection information corresponding to each of the multiple images.
模型训练模块,用于基于所述训练数据集,将每个图像作为输入数据,以及所述每个图像对应的人体特征点位置信息和人体特征点连接信息作为输出数据,通过机器学习算法进行训练,获得已训练的检测模型。The model training module is configured to use each image as input data based on the training data set, and the human body feature point position information and human body feature point connection information corresponding to each image as output data, and train through machine learning algorithms , To obtain the trained detection model.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述装置和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the device and module described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,模块相互之间的耦合可以是电性,机械或其它形式的耦合。In the several embodiments provided in this application, the coupling between the modules may be electrical, mechanical or other forms of coupling.
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模 块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。In addition, the functional modules in the various embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.
请参阅图7,其示出了本申请实施例提供的一种电子设备100的结构框图。该电子设备100可以是智能手机、平板电脑、电子书等能够运行应用程序的电子设备。本申请中的电子设备100可以包括一个或多个如下部件:处理器110、存储器120以及一个或多个应用程序,其中一个或多个应用程序可以被存储在存储器120中并被配置为由一个或多个处理器110执行,一个或多个程序配置用于执行如前述方法实施例所描述的方法。Please refer to FIG. 7, which shows a structural block diagram of an electronic device 100 provided by an embodiment of the present application. The electronic device 100 may be an electronic device capable of running application programs, such as a smart phone, a tablet computer, or an e-book. The electronic device 100 in this application may include one or more of the following components: a processor 110, a memory 120, and one or more application programs, where one or more application programs may be stored in the memory 120 and configured to be composed of one Or multiple processors 110 execute, and one or more programs are configured to execute the method described in the foregoing method embodiment.
其中,处理器110可以包括一个或者多个处理核。处理器110利用各种接口和线路连接整个电子设备100内的各个部分,通过运行或执行存储在存储器120内的指令、程序、代码集或指令集,以及调用存储在存储器120内的数据,执行电子设备100的各种功能和处理数据。可选地,处理器110可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器110可集成中央处理器(Central Processing Unit,CPU)、图形处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责待显示内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器110中,单独通过一块通信芯片进行实现。The processor 110 may include one or more processing cores. The processor 110 uses various interfaces and lines to connect various parts of the entire electronic device 100, and executes by running or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and calling data stored in the memory 120. Various functions and processing data of the electronic device 100. Optionally, the processor 110 may adopt at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). A kind of hardware form to realize. The processor 110 may be integrated with one or a combination of a central processing unit (CPU), a graphics processing unit (GPU), a modem, and the like. Among them, the CPU mainly processes the operating system, user interface, and application programs; the GPU is used for rendering and drawing the content to be displayed; the modem is used for processing wireless communication. It can be understood that the above-mentioned modem may not be integrated into the processor 110, but may be implemented by a communication chip alone.
存储器120可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。存储器120可用于存储指令、程序、代码、代码集或指令集。存储器120可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等。存储数据区还可以存储移动终端100在使用中所创建的数据(比如电话本、音视频数据、聊天记录数据)等。The memory 120 may include random access memory (RAM) or read-only memory (Read-Only Memory). The memory 120 may be used to store instructions, programs, codes, code sets or instruction sets. The memory 120 may include a program storage area and a data storage area, where the program storage area may store instructions for implementing the operating system and instructions for implementing at least one function (such as touch function, sound playback function, image playback function, etc.) , Instructions used to implement the following various method embodiments, etc. The data storage area can also store data created during use of the mobile terminal 100 (such as phone book, audio and video data, chat record data) and the like.
请参阅图8,其示出了本申请实施例提供的一种计算机可读存储介质的结构框图。该计算机可读介质300中存储有程序代码,所述程序代码可被处理器调用执行上述方法实施例中所描述的方法。Please refer to FIG. 8, which shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application. The computer-readable medium 300 stores program code, and the program code can be invoked by a processor to execute the method described in the foregoing method embodiment.
计算机可读存储介质300可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地,计算机可读存储介质300包括非易失性计算机可读介质(non-transitory computer-readable storage medium)。计算机可读存储介质300具有执行上述方法中的任何方法步骤的程序代码310的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码310可以例如以适当形式进行压缩。The computer-readable storage medium 300 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. Optionally, the computer-readable storage medium 300 includes a non-transitory computer-readable storage medium. The computer-readable storage medium 300 has storage space for the program code 310 for executing any method steps in the above-mentioned methods. These program codes can be read from or written into one or more computer program products. The program code 310 may be compressed in a suitable form, for example.
综上所述,本申请实施例提供的人体特征点的检测方法、装置、电子设备以及存储介质,获取待检测图像,对待检测图像进行下采样处理,获得待检测图像的第一图像特征,对第一图像特征进行多尺度特征提取,获得待检测图像的多个第二图像特征,对多个第二图像特征进行卷积运算,获得待检测图像中的人体特征点位置信息和人体特征点连接信息,从而通过对待检测图像进行多尺度特征提取,以获取在不同尺度下的图像特征,并基于不同尺度下的图像特征获取人体特征点位置信息和人体特征点连接信息,从而大幅度提升人体特征点检测的精度和效率。In summary, the human body feature point detection method, device, electronic device, and storage medium provided in the embodiments of the present application acquire the image to be detected, perform down-sampling processing on the image to be detected, and obtain the first image feature of the image to be detected. Perform multi-scale feature extraction on the first image feature, obtain multiple second image features of the image to be detected, perform convolution operation on multiple second image features, and obtain the position information of the human body feature points in the image to be detected and the connection of the human body feature points Information, through the multi-scale feature extraction of the image to be detected, to obtain image features at different scales, and obtain human feature point position information and human feature point connection information based on image features at different scales, thereby greatly improving human body features Accuracy and efficiency of point detection.
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application, not to limit them; although the application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions recorded in the foregoing embodiments are modified, or some of the technical features thereof are equivalently replaced; these modifications or replacements do not drive the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (20)

  1. 一种人体特征点的检测方法,其特征在于,所述方法包括:A method for detecting human body feature points, characterized in that the method includes:
    获取待检测图像;Obtain the image to be detected;
    对所述待检测图像进行下采样处理,获得所述待检测图像的第一图像特征;Performing down-sampling processing on the image to be detected to obtain the first image feature of the image to be detected;
    对所述第一图像特征进行多尺度特征提取,获得所述待检测图像的多个第二图像特征;Performing multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected;
    对所述多个第二图像特征进行卷积运算,获得所述待检测图像中的人体特征点位置信息和人体特征点连接信息。Performing a convolution operation on the multiple second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  2. 根据权利要求1所述的方法,其特征在于,所述对所述多个第二图像特征进行卷积运算,获得所述待检测图像中的人体特征点位置信息和人体特征点连接信息,包括:The method according to claim 1, wherein the performing a convolution operation on the multiple second image features to obtain the human body feature point position information and the human body feature point connection information in the to-be-detected image comprises :
    对所述待检测图像进行下采样处理,获取所述待检测图像的第三图像特征;Performing down-sampling processing on the image to be detected to obtain a third image feature of the image to be detected;
    对所述多个第二图像特征和所述第三图像特征进行卷积运算,获得所述待检测图像中的所述人体特征点位置信息和所述人体特征点连接信息。Performing a convolution operation on the plurality of second image features and the third image feature to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  3. 根据权利要求2所述的方法,其特征在于,所述对所述多个第二图像特征和所述第三图像特征进行卷积运算,获得所述待检测图像中的所述人体特征点位置信息和所述人体特征点连接信息,包括:The method according to claim 2, wherein the convolution operation is performed on the plurality of second image features and the third image feature to obtain the position of the human body feature point in the image to be detected The information and the connection information of the human body feature points include:
    将所述多个第二图像特征和所述第三图像特征进行通道连接,获得第四图像特征;Channel-connecting the plurality of second image features and the third image feature to obtain a fourth image feature;
    对所述第四图像特征进行卷积运算,获得所述待检测图像中的所述人体特征点位置信息和所述人体特征点连接信息。Performing a convolution operation on the fourth image feature to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  4. 根据权利要求3所述的方法,其特征在于,所述将所述多个第二图像特征和所述第三图像特征进行通道连接,获得第四图像特征,包括:The method according to claim 3, wherein the channel connection of the plurality of second image features and the third image feature to obtain the fourth image feature comprises:
    通过concat算子将所述多个第二图像特征和所述第三图像特征进行通道连接,获得所述第四图像特征。Channel connection of the plurality of second image features and the third image feature through a concat operator to obtain the fourth image feature.
  5. 根据权利要求3或4所述的方法,其特征在于,所述第二图像特征对应的图像尺寸和所述第三图像特征对应的图像尺寸一致。The method according to claim 3 or 4, wherein the image size corresponding to the second image feature is the same as the image size corresponding to the third image feature.
  6. 根据权利要求2所述的方法,其特征在于,所述对所述待检测图像进行下采样处理,获得所述待检测图像的第一图像特征,包括:The method according to claim 2, wherein the performing down-sampling processing on the image to be detected to obtain the first image feature of the image to be detected comprises:
    对所述待检测图像进行N1倍下采样处理,获得待处理图像特征,其中,N1=2 M1,N1为正整数; Perform N1-fold down-sampling processing on the image to be detected to obtain features of the image to be processed, where N1=2 M1 and N1 is a positive integer;
    对所述待处理图像特征进行N2倍上采样处理,获得所述待检测图像的第一图像特征,N2=2 M2,N2<N1,N2为正整数。 Performing N2 times upsampling processing on the image feature to be processed to obtain the first image feature of the image to be detected, N2=2 M2 , N2<N1, and N2 is a positive integer.
  7. 根据权利要求6所述的方法,其特征在于,所述对所述待检测图像进行下采样处理,获取所述待检测图像的第三图像特征,包括:The method according to claim 6, wherein the performing down-sampling processing on the image to be detected to obtain a third image feature of the image to be detected comprises:
    对所述待检测图像进行N3倍下采样处理,获取所述待检测图像的第三图像特征,其中,N3=2 M1-M2,N3为正整数。 Perform N3 times downsampling processing on the image to be detected to obtain the third image feature of the image to be detected, where N3=2 M1-M2 and N3 is a positive integer.
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述对所述第一图像特征进行多尺度特征提取,获得所述待检测图像的多个第二图像特征,包括:The method according to any one of claims 1-7, wherein the performing multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected comprises:
    将所述第一图像特征输入检测模型的多尺度模块,通过所述多尺度模块对所述第一图像特征进行多尺度特征提取,获得所述待检测图像的多个第二图像特征。The first image feature is input into the multi-scale module of the detection model, and the multi-scale feature extraction is performed on the first image feature through the multi-scale module to obtain a plurality of second image features of the image to be detected.
  9. 根据权利要求8所述的方法,其特征在于,所述多尺度模块包括并列的多个卷积层,所述多个卷积层中的每个卷积层的卷积核均不同,所述每个卷积层用于从所述第一图像特征中提取不同尺度的第二图像特征。The method according to claim 8, wherein the multi-scale module comprises a plurality of convolutional layers in parallel, and the convolution kernel of each convolutional layer in the plurality of convolutional layers is different, and the Each convolutional layer is used to extract second image features of different scales from the first image features.
  10. 根据权利要求1-7任一项所述的方法,其特征在于,所述对所述多个第二图像特征进行卷积运算,获得所述待检测图像中的人体特征点位置信息和人体特征点连接信息,包括:The method according to any one of claims 1-7, wherein the convolution operation is performed on the multiple second image features to obtain the human body feature point position information and the human body feature in the image to be detected Point connection information, including:
    将所述多个第二图像特征输入检测模型的热图检测模块,通过所述热图检测模块对所 述多个第二图像特征进行卷积运算,获得所述热图检测模块输出的所述人体特征点位置信息和所述人体特征点连接信息。Input the plurality of second image features into the heat map detection module of the detection model, and perform convolution operation on the plurality of second image features through the heat map detection module to obtain the output of the heat map detection module The human body feature point location information and the human body feature point connection information.
  11. 根据权利要求10所述的方法,其特征在于,所述热图检测模块包括一个卷积阶段,所述一个卷积阶段包括第一处理分支和第二处理分支,所述第一处理分支用于检测并输出所述人体特征点位置信息,所述第二处理分支用于检测并输出所述人体特征点连接信息。The method according to claim 10, wherein the heat map detection module includes a convolution stage, and the one convolution stage includes a first processing branch and a second processing branch, and the first processing branch is used for The position information of the human body feature points is detected and output, and the second processing branch is used to detect and output the connection information of the human body feature points.
  12. 根据权利要求11所述的方法,其特征在于,所述第一处理分支包括两个卷积层,所述第二处理分支包括两个卷积层。The method according to claim 11, wherein the first processing branch includes two convolutional layers, and the second processing branch includes two convolutional layers.
  13. 根据权利要求1-7任一项所述的方法,其特征在于,所述获取待检测图像之前,还包括:The method according to any one of claims 1-7, wherein before the acquiring the image to be detected, the method further comprises:
    获取训练数据集,所述训练数据集包括多个图像,以及所述多个图像中的每个图像对应的人体特征点位置信息和人体特征点连接信息;Acquiring a training data set, the training data set including a plurality of images, and human body feature point position information and human body feature point connection information corresponding to each of the multiple images;
    基于所述训练数据集,将每个图像作为输入数据,以及所述每个图像对应的人体特征点位置信息和人体特征点连接信息作为输出数据,通过机器学习算法进行训练,获得已训练的检测模型。Based on the training data set, each image is used as input data, and the human body feature point position information and human body feature point connection information corresponding to each image are used as output data, and the machine learning algorithm is trained to obtain the trained detection Model.
  14. 根据权利要求1-13任一项所述的方法,其特征在于,所述对所述多个第二图像特征进行卷积运算,获得所述待检测图像中的人体特征点位置信息和人体特征点连接信息之后,还包括:The method according to any one of claims 1-13, wherein the convolution operation is performed on the multiple second image features to obtain the human body feature point position information and the human body feature in the image to be detected After clicking the connection information, it also includes:
    基于所述人体特征点位置信息和所述人体特征点连接信息,获得人体特征点信息。Based on the human body feature point location information and the human body feature point connection information, the human body feature point information is obtained.
  15. 根据权利要求14所述的方法,其特征在于,所述基于所述人体特征点位置信息和所述人体特征点连接信息,获得人体特征点信息,包括:The method according to claim 14, wherein the obtaining human body feature point information based on the human body feature point location information and the body feature point connection information comprises:
    基于所述人体特征点位置信息,获取人体特征点的位置;Obtaining the position of the human body feature point based on the position information of the human body feature point;
    基于所述人体特征点连接信息对所述人体特征点进行连接,绘制生成所述人体特征点信息。Connect the human body feature points based on the human body feature point connection information, and draw and generate the human body feature point information.
  16. 根据权利要求1-15任一项所述的方法,其特征在于,所述对所述第一图像特征进行多尺度特征提取,获得所述待检测图像的多个第二图像特征,包括:The method according to any one of claims 1-15, wherein the performing multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected comprises:
    将所述第一图像特征输入多个不同卷积核的卷积层,以使所述多个不同卷积核的卷积层分别对所述第一图像特征进行处理,获得所述待检测图像的多个第二图像特征。The first image feature is input into a plurality of convolutional layers of different convolution kernels, so that the convolutional layers of the plurality of different convolution kernels process the first image features respectively to obtain the image to be detected Of multiple second image features.
  17. 一种人体特征点的检测装置,其特征在于,所述装置包括:A detection device for human body feature points, characterized in that the device comprises:
    获取模块,用于获取待检测图像;The acquisition module is used to acquire the image to be detected;
    第一图像特征获取模块,用于对所述待检测图像进行下采样处理,获得所述待检测图像的第一图像特征;The first image feature acquisition module is configured to perform down-sampling processing on the image to be detected to obtain the first image feature of the image to be detected;
    第二图像特征获取模块,用于对所述第一图像特征进行多尺度特征提取,获得所述待检测图像的多个第二图像特征;The second image feature acquisition module is configured to perform multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected;
    人体特征点检测模块,用于对所述多个第二图像特征进行卷积运算,获得所述待检测图像中的人体特征点位置信息和人体特征点连接信息。The human body feature point detection module is configured to perform a convolution operation on the multiple second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  18. 根据权利要求17所述的装置,其特征在于,所述人体特征点检测模块,包括:The device according to claim 17, wherein the human body feature point detection module comprises:
    第三特征图像获取子模块,用于对所述待检测图像进行下采样处理,获取所述待检测图像的第三图像特征;The third feature image acquisition sub-module is configured to perform down-sampling processing on the image to be detected to acquire the third image feature of the image to be detected;
    第一人体特征点检测子模块,用于对所述多个第二图像特征和所述第三图像特征进行卷积运算,获得所述待检测图像中的所述人体特征点位置信息和所述人体特征点连接信息。The first human body feature point detection sub-module is used to perform convolution operations on the plurality of second image features and the third image feature to obtain the human body feature point position information in the image to be detected and the Human body feature point connection information.
  19. 一种电子设备,其特征在于,包括存储器和处理器,所述存储器耦接到所述处理器,所述存储器存储指令,当所述指令由所述处理器执行时所述处理器执行如权利要求1-16任一项所述的方法。An electronic device, comprising a memory and a processor, the memory is coupled to the processor, the memory stores instructions, and the processor executes the instructions when the instructions are executed by the processor. The method described in any one of 1-16 is required.
  20. 一种计算机可读取存储介质,其特征在于,所述计算机可读取存储介质中存储有程序代码,所述程序代码可被处理器调用执行如权利要求1-16任一项所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium stores program code, and the program code can be called by a processor to execute the method according to any one of claims 1-16 .
PCT/CN2021/073863 2020-03-12 2021-01-27 Human body feature point detection method and apparatus, electronic device, and storage medium WO2021179822A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010171918.8 2020-03-12
CN202010171918.8A CN111414823B (en) 2020-03-12 2020-03-12 Human body characteristic point detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021179822A1 true WO2021179822A1 (en) 2021-09-16

Family

ID=71492884

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/073863 WO2021179822A1 (en) 2020-03-12 2021-01-27 Human body feature point detection method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN111414823B (en)
WO (1) WO2021179822A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414823B (en) * 2020-03-12 2023-09-12 Oppo广东移动通信有限公司 Human body characteristic point detection method and device, electronic equipment and storage medium
CN113177432B (en) * 2021-03-16 2023-08-29 重庆兆光科技股份有限公司 Head posture estimation method, system, equipment and medium based on multi-scale lightweight network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120114175A1 (en) * 2010-11-05 2012-05-10 Samsung Electronics Co., Ltd. Object pose recognition apparatus and object pose recognition method using the same
CN109726659A (en) * 2018-12-21 2019-05-07 北京达佳互联信息技术有限公司 Detection method, device, electronic equipment and the readable medium of skeleton key point
CN110245655A (en) * 2019-05-10 2019-09-17 天津大学 A kind of single phase object detecting method based on lightweight image pyramid network
CN110263756A (en) * 2019-06-28 2019-09-20 东北大学 A kind of human face super-resolution reconstructing system based on joint multi-task learning
CN111414823A (en) * 2020-03-12 2020-07-14 Oppo广东移动通信有限公司 Human body feature point detection method and device, electronic equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182384B (en) * 2017-12-07 2020-09-29 浙江大华技术股份有限公司 Face feature point positioning method and device
CN108664885B (en) * 2018-03-19 2021-08-31 杭州电子科技大学 Human body key point detection method based on multi-scale cascade Hourglass network
CN109670397B (en) * 2018-11-07 2020-10-30 北京达佳互联信息技术有限公司 Method and device for detecting key points of human skeleton, electronic equipment and storage medium
CN113569798B (en) * 2018-11-16 2024-05-24 北京市商汤科技开发有限公司 Key point detection method and device, electronic equipment and storage medium
CN110705365A (en) * 2019-09-06 2020-01-17 北京达佳互联信息技术有限公司 Human body key point detection method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120114175A1 (en) * 2010-11-05 2012-05-10 Samsung Electronics Co., Ltd. Object pose recognition apparatus and object pose recognition method using the same
CN109726659A (en) * 2018-12-21 2019-05-07 北京达佳互联信息技术有限公司 Detection method, device, electronic equipment and the readable medium of skeleton key point
CN110245655A (en) * 2019-05-10 2019-09-17 天津大学 A kind of single phase object detecting method based on lightweight image pyramid network
CN110263756A (en) * 2019-06-28 2019-09-20 东北大学 A kind of human face super-resolution reconstructing system based on joint multi-task learning
CN111414823A (en) * 2020-03-12 2020-07-14 Oppo广东移动通信有限公司 Human body feature point detection method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111414823A (en) 2020-07-14
CN111414823B (en) 2023-09-12

Similar Documents

Publication Publication Date Title
CN109493350B (en) Portrait segmentation method and device
CN110473141B (en) Image processing method, device, storage medium and electronic equipment
WO2021169723A1 (en) Image recognition method and apparatus, electronic device, and storage medium
CN108470320B (en) Image stylization method and system based on CNN
CN110532984B (en) Key point detection method, gesture recognition method, device and system
CN109241880B (en) Image processing method, image processing apparatus, computer-readable storage medium
WO2021073493A1 (en) Image processing method and device, neural network training method, image processing method of combined neural network model, construction method of combined neural network model, neural network processor and storage medium
US11151361B2 (en) Dynamic emotion recognition in unconstrained scenarios
US9633282B2 (en) Cross-trained convolutional neural networks using multimodal images
WO2020199478A1 (en) Method for training image generation model, image generation method, device and apparatus, and storage medium
CN112651438A (en) Multi-class image classification method and device, terminal equipment and storage medium
CN111104962A (en) Semantic segmentation method and device for image, electronic equipment and readable storage medium
WO2020015752A1 (en) Object attribute identification method, apparatus and system, and computing device
US20230085605A1 (en) Face image processing method, apparatus, device, and storage medium
CN112990219B (en) Method and device for image semantic segmentation
CN110415250B (en) Overlapped chromosome segmentation method and device based on deep learning
WO2018082308A1 (en) Image processing method and terminal
WO2021179822A1 (en) Human body feature point detection method and apparatus, electronic device, and storage medium
CN113011253B (en) Facial expression recognition method, device, equipment and storage medium based on ResNeXt network
CN112927209B (en) CNN-based significance detection system and method
CN110807362A (en) Image detection method and device and computer readable storage medium
CN110958469A (en) Video processing method and device, electronic equipment and storage medium
CN112308866A (en) Image processing method, image processing device, electronic equipment and storage medium
CN111292334B (en) Panoramic image segmentation method and device and electronic equipment
CN112381061A (en) Facial expression recognition method and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21767556

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21767556

Country of ref document: EP

Kind code of ref document: A1