WO2021179822A1 - Procédé et appareil de détection de points caractéristiques du corps humain, dispositif électronique et support de stockage - Google Patents

Procédé et appareil de détection de points caractéristiques du corps humain, dispositif électronique et support de stockage Download PDF

Info

Publication number
WO2021179822A1
WO2021179822A1 PCT/CN2021/073863 CN2021073863W WO2021179822A1 WO 2021179822 A1 WO2021179822 A1 WO 2021179822A1 CN 2021073863 W CN2021073863 W CN 2021073863W WO 2021179822 A1 WO2021179822 A1 WO 2021179822A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
human body
detected
feature
feature point
Prior art date
Application number
PCT/CN2021/073863
Other languages
English (en)
Chinese (zh)
Inventor
吴佳涛
Original Assignee
Oppo广东移动通信有限公司
上海瑾盛通信科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司, 上海瑾盛通信科技有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021179822A1 publication Critical patent/WO2021179822A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Definitions

  • This application relates to the technical field of electronic equipment, and more specifically, to a method, device, electronic equipment, and storage medium for detecting human body feature points.
  • artificial intelligence technology has gradually been applied to the field of detection of human feature points.
  • the detection speed is consistent with the human body in the image.
  • the number shows a linear growth relationship.
  • this application proposes a detection method, device, electronic equipment and storage medium for human body feature points to solve the above-mentioned problems.
  • an embodiment of the present application provides a method for detecting feature points of a human body.
  • the method includes: acquiring an image to be detected; performing down-sampling processing on the image to be detected to obtain a first image of the image to be detected Feature; perform multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected; perform convolution operation on the multiple second image features to obtain the image to be detected
  • the human body feature point location information and the human body feature point connection information is acquiring an image to be detected; performing down-sampling processing on the image to be detected to obtain a first image of the image to be detected Feature; perform multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected; perform convolution operation on the multiple second image features to obtain the image to be detected.
  • an embodiment of the present application provides a device for detecting feature points of a human body.
  • the device includes: a to-be-detected image acquisition module for acquiring the to-be-detected image; a first image feature acquisition module for evaluating the to-be-detected image
  • the detection image is subjected to down-sampling processing to obtain the first image feature of the image to be detected;
  • the second image feature acquisition module is configured to perform multi-scale feature extraction on the first image feature to obtain a plurality of the images to be detected The second image feature;
  • the human body feature point detection module is used to perform a convolution operation on the multiple second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  • an embodiment of the present application provides an electronic device, including a memory and a processor, the memory is coupled to the processor, the memory stores instructions, and the instructions are executed when the instructions are executed by the processor.
  • the processor executes the above method.
  • an embodiment of the present application provides a computer readable storage medium, and the computer readable storage medium stores program code, and the program code can be invoked by a processor to execute the above method.
  • FIG. 1 shows a schematic flowchart of a method for detecting human body feature points according to an embodiment of the present application
  • FIG. 2 shows a schematic flowchart of a method for detecting human body feature points according to another embodiment of the present application
  • FIG. 3 shows a schematic flowchart of step S260 of the method for detecting human body feature points shown in FIG. 2 of the present application;
  • FIG. 4 shows a schematic flowchart of a method for detecting human body feature points according to another embodiment of the present application
  • FIG. 5 shows the overall framework diagram of the detection model provided by the embodiment of the present application.
  • FIG. 6 shows a block diagram of a module of a device for detecting human body feature points provided by an embodiment of the present application
  • Fig. 7 shows a block diagram of an electronic device used in an embodiment of the present application to execute the method for detecting human body feature points according to the embodiment of the present application;
  • Fig. 8 shows a storage unit for storing or carrying program codes that implement the method for detecting human body feature points according to the embodiment of the present application.
  • Convolutional neural network is a kind of neural network that includes convolution calculation and has a certain deep structure. It is one of the representative algorithms of deep learning.
  • the development of convolutional neural networks has generally included the following types of stacked layers: input layer, convolutional layer, pooling layer, normalization layer (also called Batch Norm layer), activation function layer, fully connected layer, output layer Wait.
  • the input layer is generally a RGB three-channel color image;
  • the function of the convolutional layer is to extract features of the input data, and the calculation form is convolution operation, including weight coefficients and bias;
  • the pooling layer is used to extract features Information is selected and filtered.
  • Commonly used pooling methods include maximum pooling and average pooling; the normalization layer normalizes the input data so that the distribution of each feature is similar, and the network is easier to train; the activation function layer is used for Add nonlinear factors to the model to make the model have a stronger fitting ability; the fully connected layer is generally located in the last part of the convolutional neural network, and the input features are nonlinearly combined to obtain the output; the output layer outputs the type of results required by the model,
  • the output layer uses softmax (normalized exponential function, often used as an output layer in the field of deep learning to obtain a specified type of output) and other functions to output classification labels.
  • the output layer directly outputs each pixel
  • the classification results of the human body feature point detection problem the output layer outputs the human body feature point heat map (different algorithm models may also output other heat maps for auxiliary feature point detection and allocation).
  • Human feature point detection mainly detects some feature points of the human body, such as eyes, nose, elbows, shoulders, etc., and connects them in sequence in the order of feature points, and describes human body information through feature points. Expanded, it can also describe the posture, gait, behavior and other information of the human body.
  • Human feature point detection is one of the basic algorithms of computer vision, and it has played a basic role in the research of other related fields of computer vision, such as behavior recognition, intelligent composition and other related fields.
  • Existing human feature point detection algorithms based on deep learning can be divided into two directions, namely, a top-down detection method and a bottom-up detection method.
  • the top-down human feature point detection algorithm divides the human feature point detection task into two parts: human body detection and single-person human feature point detection, that is, each person in the image is detected individually through the target detection algorithm. Then, on the basis of the detection frame, the human body feature point detection is performed for a single person.
  • the top-down method tends to have higher detection accuracy, but the detection speed of this method has a linear growth relationship with the number of people in the image, and additional target detection algorithms are needed as support.
  • the bottom-up method also includes two parts: multi-person feature point detection in the image and post-processing, that is, firstly, all feature points in the image need to be detected, and then related strategies are applied in the post-processing module to remove all feature points.
  • the feature points of are assigned to different individuals, and representative algorithms include Openpose, PersonLab, etc.
  • the detection accuracy of the bottom-up method is lower than that of the top-down method, but the detection speed is faster, and the detection time has nothing to do with the number of people in the image.
  • the post-processing module is often composed of some logic strategies, such as greedy algorithms.
  • the Openpose algorithm in addition to detecting the distribution heat map of feature points (also called heatmap), the Openpose algorithm also proposes a heat map representing the connection information of feature points: pafmap.
  • the position with high confidence in the heat map indicates that there is a feature point connection at that location. The probability is also high.
  • the heatmap and pafmap are used as the output of the algorithm model, and the greedy algorithm is used as the post-processing strategy to realize the assignment of multi-person feature points to independent character instances.
  • the method has undergone two versions of evolution. In the first version released, the model structure is divided into a basic network and a heat map detection network.
  • the heat map detection network contains multiple stages, and each stage is divided into two upper and lower stages.
  • the algorithm model also has the disadvantages of high model complexity and large amount of calculation.
  • the stacking of multiple stages does not significantly improve the accuracy of the model, but it brings a lot of redundant calculations.
  • the second version has a single branch structure.
  • the 3*3 residual connection method in the middle can increase the sensing field information, it brings a very small increase in accuracy, and at the same time causes a lot of waste of calculations.
  • the inventor has discovered through long-term research and proposed the method, device, electronic equipment and storage medium for detecting human body feature points provided by the embodiments of this application.
  • the multi-scale feature extraction of the image to be detected is used to obtain different information.
  • the position information of the human body feature points and the connection information of the human body feature points are obtained based on the image features at different scales, thereby greatly improving the accuracy and efficiency of the detection of human body feature points.
  • the specific detection method of human body feature points will be described in detail in the subsequent embodiments.
  • the electronic device applied in this embodiment can be a mobile terminal, a smart phone, a tablet computer, a wearable electronic device, etc. Make a limit.
  • the process shown in FIG. 1 will be described in detail below.
  • the method for detecting human feature points may specifically include the following steps:
  • Step S110 Obtain an image to be detected.
  • Step S120 Perform down-sampling processing on the image to be detected to obtain a first image feature of the image to be detected.
  • the image to be detected may be down-sampled to obtain the first image feature of the image to be detected.
  • the image to be detected may be sequentially subjected to 2 times down-sampling processing until the obtained first image feature of the image to be detected meets the processing requirements.
  • the image to be detected may be sequentially subjected to 2 times down-sampling processing, a total of 4 times That is, 16 times down-sampling processing is performed on the image to be detected, so that the first image feature of the image to be detected includes sufficient abstract features without causing excessive feature extraction to meet processing requirements.
  • the image to be detected can be down-sampled twice, and then the image features obtained by the down-sampling process can be down-sampled by a factor of 4, and then down-sampling the image feature obtained by the down-sampling process by 4 times.
  • the acquired image features are subjected to 8-fold down-sampling processing, and then the image features acquired through the 8-fold down-sampling processing are subjected to 16-fold down-sampling processing to obtain the first image feature of the image to be detected.
  • the image to be detected can also be down-sampled by more times.
  • the image to be detected can also be down-sampled by 32 times and 64-fold, which is not limited here.
  • Step S130 Perform multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected.
  • the down-sampling processing of the image to be detected is to sequentially perform 2 times down-sampling processing on the image to be detected, specifically, after the image to be detected is subjected to 2 times down-sampling processing, the image features obtained by the 2 times down-sampling processing are then performed Perform 4 times downsampling processing, etc., that is, the above downsampling is processed in a serial manner.
  • the input of a certain convolutional layer can only be the output of the previous convolutional layer, which means that the convolutional layer can learn
  • the feature information in the image feature can only be the single receptive field information represented by the output of the previous convolutional layer, that is, the scale and receptive field of the first image feature of the image to be detected obtained through down-sampling processing are relatively simple.
  • Step S140 Perform a convolution operation on the multiple second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  • convolution operations may be performed on the multiple second image features to obtain the human body feature point position information (heatmap) and human body features of the image to be detected Click the connection information (pafmap).
  • the multiple second image features can be divided into two branches for convolution operation, where one branch performs the convolution operation on the multiple second image features.
  • the convolution operation outputs human body feature point position information, and the other branch performs a convolution operation on multiple second image features to output human body feature point connection information.
  • the method for detecting feature points of a human body is to obtain an image to be detected, perform down-sampling processing on the image to be detected, obtain the first image feature of the image to be detected, and perform multi-scale feature extraction on the first image feature to obtain the image to be detected.
  • Detect multiple second image features of the image at different scales and different receptive fields and perform convolution operations on multiple second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected, so as to pass Perform multi-scale feature extraction on the image to be detected to obtain image features at different scales and different receptive fields, and obtain human feature point position information and human feature point connection information based on image features at different scales and different receptive fields. Increase the accuracy and efficiency of human feature point detection.
  • FIG. 2 shows a schematic flowchart of a method for detecting human body feature points according to another embodiment of the present application.
  • the process shown in FIG. 2 will be described in detail below.
  • the method for detecting human feature points may specifically include the following steps:
  • Step S210 Obtain an image to be detected.
  • step S210 For the specific description of step S210, please refer to step S110, which will not be repeated here.
  • the image to be detected may be down-sampled by N1 times to obtain the features of the image to be processed.
  • the image size after the N1 times downsampling of the image to be detected is often smaller. If you directly downsample the N1 times Multi-scale feature extraction is performed on the image features to be processed after sampling processing. Then, when convolution with a large convolution kernel is performed, it is easy to cause excessive extraction of image features and introduce too much unnecessary redundant information. For example, in order to obtain more abstract features of the image to be detected, the image to be detected is generally downsampled by 16 times. Correspondingly, the image size after the image to be detected is downsampled by 16 times will be smaller. The first feature image to be processed after the down-sampling process is subjected to multi-scale feature extraction. Then, when performing 7*7 convolution, it will easily cause excessive extraction of image features and introduce unnecessary redundant information.
  • the image features to be processed can also be subjected to N2 times upsampling processing to determine the newly acquired image features as the image to be detected.
  • the first image feature to avoid excessive extraction of image features and the introduction of unnecessary redundant information.
  • performing N1 times downsampling processing on the image to be detected may be 16 times downsampling processing for the image to be detected, and performing N2 times upsampling processing on the image features to be processed may be 2 times upsampling processing.
  • Step S240 Perform multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected.
  • step S240 please refer to step S130, which will not be repeated here.
  • Step S250 Perform down-sampling processing on the image to be detected, and obtain a third image feature of the image to be detected.
  • the down-sampling process participates in the convolution operation in this method.
  • This method can not only further increase the feature scale information and receptive field, but also increase the shallow accurate pixel position information, and improve the acquisition accuracy of the human body feature point position information and the body feature point connection information .
  • the multiple second image features obtained by performing multi-scale feature extraction on the first image feature of the image to be detected are abstract features of the image to be detected
  • the third image feature obtained by down-sampling the image to be detected is the image to be detected.
  • the shallow features of the image that is, multiple second image features and third image features have different scales and different receptive fields. Therefore, the third image feature is involved in the convolution operation to obtain the position information of the human body feature points and the connection of the human body feature points In information, the scale and receptive field of the data can be increased. Furthermore, since the third image feature is a shallow image feature, and the pixel position information of the shallow image feature is more accurate, it can improve the acquired human feature point position information and The accuracy of the connection information of the feature points of the human body.
  • the image to be detected can also be down-sampled to obtain the third image feature of the image to be detected, and the third image feature is involved in the convolution operation.
  • the feature extraction of the image to be detected may be performed through a convolutional layer, which is not limited herein.
  • the channel connection between the first image feature and the third image feature is to be performed, it is necessary to ensure that the image size corresponding to the first image feature and the image size corresponding to the third image feature are one foot.
  • the first image feature is obtained by downsampling 16 times on the image to be detected
  • the third image feature also needs to be obtained by downsampling 16 times. If the first image feature is obtained by down sampling 8 times on the image to be detected If it is obtained, the third image feature also needs to be obtained through 8-fold down-sampling.
  • the image to be detected may be down-sampled by N3 times to obtain the third image feature of the image to be detected.
  • the N3 times downsampling process of the image to be detected can be the 2 M1-M2 times downsampling process for the image to be detected, so that the image size corresponding to the third image feature obtained by the N3 times downsampling process of the image to be detected can be compared with the first
  • the image size corresponding to one image feature is consistent, so as to provide a connection basis when a plurality of second to-be-processed image features and third image features are subsequently channel-connected.
  • Step S260 Perform a convolution operation on the plurality of second image features and the third image feature to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  • a convolution operation may be performed on the multiple second image features and third image features to obtain the human body of the image to be detected Feature point location information and body feature point connection information.
  • the multiple second image features and third image features may be divided into two branches for convolution operation, where: One branch performs convolution operations on multiple second image features and third image features to output human feature point position information, and the other branch performs convolution operations on multiple second image features and third image features to output human feature point connection information .
  • FIG. 3 shows a schematic flowchart of step S260 of the method for detecting human body feature points shown in FIG. 2 of the present application.
  • the following will elaborate on the process shown in FIG. 3, and the method may specifically include the following steps:
  • Step S261 Channel connecting the plurality of second image features and the third image feature to obtain a fourth image feature.
  • multiple second image features and third image features can be channel-connected to obtain the fourth image feature, and the fourth image feature is involved in the convolution operation to obtain the total human body feature of the image to be detected Point location information and body feature point connection information.
  • Step S262 Perform a convolution operation on the fourth image feature to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  • a convolution operation may be performed on the fourth image feature to obtain the position information of the human body feature point and the connection information of the human body feature point of the image to be detected.
  • the fourth image feature can be divided into two branches for convolution operation, and one branch performs convolution operation on the fourth image feature to output the human body feature Point position information, another branch performs a convolution operation on the fourth image feature to output the human body feature point connection information.
  • a method for detecting feature points of a human body is provided, an image to be detected is obtained, the image to be detected is subjected to N1 times down-sampling processing to obtain the image features to be processed, and the image to be processed is subjected to N2 times upsampling processing to obtain the to-be-detected image features
  • the first image feature of the image, multi-scale feature extraction is performed on the first image feature, and multiple second image features of the image to be detected under different scales and different receptive fields are obtained.
  • Feature extraction is performed on the image to be detected to obtain the image of the image to be detected.
  • a convolution operation is performed on a plurality of second image features and a third image feature to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  • this embodiment also performs N1 times downsampling processing on the image to be detected, and then performs N2 times upsampling processing to obtain the first image features to obtain more abstractions. At the same time, it can avoid the excessive extraction of image features and the introduction of unnecessary redundant information.
  • this embodiment also performs a convolution operation based on a plurality of second image features and a third image feature extracted based on the image to be detected, so as to increase the receptive field of the image.
  • FIG. 4 shows a schematic flowchart of a method for detecting human body feature points according to another embodiment of the present application.
  • the following will elaborate on the process shown in FIG. 4, and the method for detecting human feature points may specifically include the following steps:
  • Step S310 Obtain an image to be detected.
  • step S310 For the specific description of step S310, please refer to step S110, which will not be repeated here.
  • Step S320 Perform down-sampling processing on the image to be detected to obtain a first image feature of the image to be detected.
  • a trained detection model may be used to process the acquired image to be detected, so as to output the human body feature point position information and the human body feature point connection information of the to be detected image.
  • FIG. 5 shows the overall framework diagram of the detection model provided by the embodiment of the present application.
  • the detection model may include three main parts: a basic network module F, a multi-scale module M, and a heat map detection module. S.
  • the image to be detected can be input to the basic network module in the detection model, and the image to be detected is down-sampled through the basic network module to obtain the first image feature of the image to be detected, and the The first image feature is used as the input of the multi-scale module in the detection model.
  • the basic network module may include: Vgg, ResNet, Mobilenet, and other convolutional neural networks. If a deeper network model such as Vgg, ResNet is used, the computational complexity of the model will be increased, but higher If a lightweight network model such as Mobilenet is used, a certain detection accuracy will be lost, but a faster detection speed can be obtained, and complete real-time detection can be achieved.
  • Step S330 Input the first image feature into the multi-scale module of the detection model, and perform multi-scale feature extraction on the first image feature through the multi-scale module to obtain multiple second image features of the image to be detected .
  • the first image feature after obtaining the first image feature output by the basic network module, can be input to the multi-scale module of the detection model to perform multi-scale feature extraction on the first image feature through the multi-scale module , To obtain multiple second image features of the image to be detected.
  • the multi-scale module includes multiple convolutional layers in parallel, and the convolution kernel of each convolutional layer in the multiple convolutional layers is different, and each convolutional layer is used to obtain data from the first image. Extract the second image features of different scales and different receptive fields from the features.
  • the multi-scale module can include 4 parallel convolutional layers, in order: 1*1 convolution, 3*3 convolution, 5*5 convolution, and 7*7 convolution, each convolution
  • the size of the convolution kernel of the layers increases sequentially, and is responsible for extracting image information of different scales and receptive fields.
  • the four parallel convolutional layers together form the multi-scale module.
  • Step S340 Input the plurality of second image features into the heat map detection module of the detection model, and perform a convolution operation on the plurality of second image features through the heat map detection module to obtain the output of the heat map detection module
  • the human body feature point location information and the human body feature point connection information are included in Step S340.
  • multiple second image features can be input to the heat map detection module of the detection model, so that the multiple second image features can be detected by the heat map detection module.
  • Two image features are subjected to convolution operation to obtain the position information of the human body feature points and the connection information of the human body feature points.
  • the third image feature output by the basic network module can also be obtained, and then multiple second image features and third image features can be channel-connected to obtain the fourth image feature, and then input into the heat map of the detection model
  • the detection module is configured to perform a convolution operation on the fourth image feature through the heat map detection module to obtain the human body feature point position information and the human body feature point connection information.
  • the heat map detection module includes only one convolution stage.
  • the one convolution stage includes a first processing branch and a second processing branch.
  • the first processing branch is used to detect and output human body feature point position information.
  • the second processing branch is used to detect and output the connection information of the human body feature points.
  • the first processing branch includes two convolutional layers
  • the second processing branch includes two convolutional layers.
  • the heat map detection module is serially connected by multiple stages to improve accuracy, but experiments have shown that neither heatmap detection nor pafmap detection does not require too many stages for correction.
  • the concatenation of the stages not only brings a very limited increase in accuracy, but also brings a huge amount of parameters and calculations.
  • a multi-scale module is added, so that the image feature information input to the heat map detection module already contains very rich image feature information and scale information, which further makes it possible for the heat map detection module to reduce the number of stages, and only use A stage is enough to achieve high accuracy, and it can also greatly reduce the amount of calculations and parameters of the model, so that the model can be detected in real time on the mobile terminal.
  • the heat map detection module contains only one stage. In order to further reduce the amount of parameters and calculations, only two convolutional layers are used in each branch of the stage: a 3*3 convolution is responsible for the input The channel connection image feature is used for further feature extraction, and another 1*1 convolution is responsible for detecting the position information of the human body feature point/the connection information of the human body feature point, and output the feature map corresponding to the number of channels.
  • the embodiment of the present application may further include training and correction of the detection model, wherein the training of the detection model may be performed in advance according to the acquired training data set, and subsequently Each time the detection is performed, the detection can be performed according to the detection model, and there is no need to train the detection model each time the detection is performed.
  • an objective function can be set, which is used to measure the difference between the detection result of the detection model and the real label.
  • This function is called a loss function, also called a loss function.
  • the feature point location heat map loss is used to measure the loss between the detected feature point location heat map and the real feature point location heat map:
  • Feature point connection heat map loss is used to measure the loss between the detected feature point connection heat map and the real feature point connection heat map:
  • a method for detecting feature points of a human body is provided, an image to be detected is obtained, the image to be detected is down-sampled, the first image feature of the image to be detected is obtained, and the first image feature is input to the multi-scale module of the detection model , Perform multi-scale feature extraction on the first image feature through the multi-scale module to obtain multiple second image features of the image to be detected, input multiple second image features into the heat map detection module of the detection model, and use the heat map detection module to A convolution operation is performed on a plurality of second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  • this embodiment also detects the human body feature points of the image to be detected through the detection model, so as to improve the accuracy of detecting the human body feature points.
  • FIG. 6 shows a block diagram of a human body feature point detection apparatus 200 provided by an embodiment of the present application.
  • the human body feature point detection device 200 includes: a to-be-detected image acquisition module 210, a first image feature acquisition module 220, a second image feature acquisition module 230, and human body feature point detection Module 240, where:
  • the to-be-detected image acquisition module 210 is used to acquire the to-be-detected image.
  • the first image feature acquisition module 220 is configured to perform down-sampling processing on the image to be detected to obtain the first image feature of the image to be detected.
  • the first image feature acquisition module 220 includes: a to-be-processed image feature acquisition sub-module and a first image feature acquisition sub-module, wherein:
  • the second image feature acquisition module 230 is configured to perform multi-scale feature extraction on the first image feature to obtain multiple second image features of the image to be detected.
  • the second image feature acquisition module 230 includes: a second image feature acquisition sub-module, wherein:
  • the human body feature point detection module 240 is configured to perform a convolution operation on the multiple second image features to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  • the human body feature point detection module 240 includes: a third feature image acquisition sub-module and a first human body feature point detection sub-module, wherein:
  • the third feature image acquisition sub-module is configured to perform feature extraction on the image to be detected, and obtain the third image feature of the image to be detected.
  • the third characteristic image acquisition sub-module includes: a third characteristic image acquisition unit, wherein:
  • the first human body feature point detection sub-module is used to perform convolution operations on the plurality of second image features and the third image feature to obtain the human body feature point position information in the image to be detected and the Human body feature point connection information.
  • the fourth image feature obtaining unit is configured to channel-connect the plurality of second image features and the third image feature to obtain a fourth image feature.
  • the human body feature point detection unit is configured to perform a convolution operation on the fourth image feature to obtain the human body feature point position information and the human body feature point connection information in the image to be detected.
  • the human body feature point detection module 240 includes: a second human body feature point detection sub-module, wherein:
  • the second human body feature point detection sub-module is used to input the multiple second image features into the heat map detection module of the detection model, and perform convolution operations on the multiple second image features through the heat map detection module, Obtain the human body feature point location information and the human body feature point connection information output by the heat map detection module.
  • the training data set acquisition module is used to acquire a training data set, the training data set includes a plurality of images, and the human body feature point position information and the human body feature point connection information corresponding to each of the multiple images.
  • the model training module is configured to use each image as input data based on the training data set, and the human body feature point position information and human body feature point connection information corresponding to each image as output data, and train through machine learning algorithms , To obtain the trained detection model.
  • the coupling between the modules may be electrical, mechanical or other forms of coupling.
  • the functional modules in the various embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.
  • FIG. 7 shows a structural block diagram of an electronic device 100 provided by an embodiment of the present application.
  • the electronic device 100 may be an electronic device capable of running application programs, such as a smart phone, a tablet computer, or an e-book.
  • the electronic device 100 in this application may include one or more of the following components: a processor 110, a memory 120, and one or more application programs, where one or more application programs may be stored in the memory 120 and configured to be composed of one Or multiple processors 110 execute, and one or more programs are configured to execute the method described in the foregoing method embodiment.
  • the processor 110 may include one or more processing cores.
  • the processor 110 uses various interfaces and lines to connect various parts of the entire electronic device 100, and executes by running or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and calling data stored in the memory 120.
  • Various functions and processing data of the electronic device 100 may adopt at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA).
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PDA Programmable Logic Array
  • the processor 110 may be integrated with one or a combination of a central processing unit (CPU), a graphics processing unit (GPU), a modem, and the like.
  • the CPU mainly processes the operating system, user interface, and application programs; the GPU is used for rendering and drawing the content to be displayed; the modem is used for processing wireless communication. It can be understood that the above-mentioned modem may not be integrated into the processor 110, but may be implemented by a communication chip alone.
  • the memory 120 may include random access memory (RAM) or read-only memory (Read-Only Memory).
  • the memory 120 may be used to store instructions, programs, codes, code sets or instruction sets.
  • the memory 120 may include a program storage area and a data storage area, where the program storage area may store instructions for implementing the operating system and instructions for implementing at least one function (such as touch function, sound playback function, image playback function, etc.) , Instructions used to implement the following various method embodiments, etc.
  • the data storage area can also store data created during use of the mobile terminal 100 (such as phone book, audio and video data, chat record data) and the like.
  • FIG. 8 shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application.
  • the computer-readable medium 300 stores program code, and the program code can be invoked by a processor to execute the method described in the foregoing method embodiment.
  • the computer-readable storage medium 300 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the computer-readable storage medium 300 includes a non-transitory computer-readable storage medium.
  • the computer-readable storage medium 300 has storage space for the program code 310 for executing any method steps in the above-mentioned methods. These program codes can be read from or written into one or more computer program products.
  • the program code 310 may be compressed in a suitable form, for example.
  • the human body feature point detection method, device, electronic device, and storage medium acquire the image to be detected, perform down-sampling processing on the image to be detected, and obtain the first image feature of the image to be detected.
  • Perform multi-scale feature extraction on the first image feature obtain multiple second image features of the image to be detected, perform convolution operation on multiple second image features, and obtain the position information of the human body feature points in the image to be detected and the connection of the human body feature points Information, through the multi-scale feature extraction of the image to be detected, to obtain image features at different scales, and obtain human feature point position information and human feature point connection information based on image features at different scales, thereby greatly improving human body features Accuracy and efficiency of point detection.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention, qui appartient au domaine technique des dispositifs électroniques, concerne un procédé et un appareil de détection de points caractéristiques du corps humain, un dispositif électronique et un support de stockage. Le procédé comprend : l'obtention d'une image à détecter ; la réalisation d'un traitement par sous-échantillonnage sur ladite image pour obtenir des premières caractéristiques d'image de ladite image ; la réalisation d'une extraction de caractéristiques à multiples échelles sur les premières caractéristiques d'image pour obtenir une pluralité de secondes caractéristiques d'image de ladite image ; et la réalisation d'une opération de convolution sur la pluralité de secondes caractéristiques d'image pour obtenir des informations de position de points caractéristiques du corps humain et des informations de connexion de points caractéristiques du corps humain de ladite image. Selon la présente invention, une extraction de caractéristiques à multiples échelles est réalisée sur ladite image pour obtenir des caractéristiques d'images à différentes échelles, et les informations de position de points caractéristiques du corps humain et les informations de connexion de points caractéristiques du corps humain sont obtenues sur la base des caractéristiques d'image à différentes échelles, ce qui améliore grandement la précision et l'efficacité de la détection de points caractéristiques du corps humain.
PCT/CN2021/073863 2020-03-12 2021-01-27 Procédé et appareil de détection de points caractéristiques du corps humain, dispositif électronique et support de stockage WO2021179822A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010171918.8 2020-03-12
CN202010171918.8A CN111414823B (zh) 2020-03-12 2020-03-12 人体特征点的检测方法、装置、电子设备以及存储介质

Publications (1)

Publication Number Publication Date
WO2021179822A1 true WO2021179822A1 (fr) 2021-09-16

Family

ID=71492884

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/073863 WO2021179822A1 (fr) 2020-03-12 2021-01-27 Procédé et appareil de détection de points caractéristiques du corps humain, dispositif électronique et support de stockage

Country Status (2)

Country Link
CN (1) CN111414823B (fr)
WO (1) WO2021179822A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414823B (zh) * 2020-03-12 2023-09-12 Oppo广东移动通信有限公司 人体特征点的检测方法、装置、电子设备以及存储介质
CN113177432B (zh) * 2021-03-16 2023-08-29 重庆兆光科技股份有限公司 基于多尺度轻量化网络的头部姿态估计方法、系统、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120114175A1 (en) * 2010-11-05 2012-05-10 Samsung Electronics Co., Ltd. Object pose recognition apparatus and object pose recognition method using the same
CN109726659A (zh) * 2018-12-21 2019-05-07 北京达佳互联信息技术有限公司 人体骨骼关键点的检测方法、装置、电子设备和可读介质
CN110245655A (zh) * 2019-05-10 2019-09-17 天津大学 一种基于轻量级图像金字塔网络的单阶段物体检测方法
CN110263756A (zh) * 2019-06-28 2019-09-20 东北大学 一种基于联合多任务学习的人脸超分辨率重建系统
CN111414823A (zh) * 2020-03-12 2020-07-14 Oppo广东移动通信有限公司 人体特征点的检测方法、装置、电子设备以及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182384B (zh) * 2017-12-07 2020-09-29 浙江大华技术股份有限公司 一种人脸特征点定位方法及装置
CN108664885B (zh) * 2018-03-19 2021-08-31 杭州电子科技大学 基于多尺度级联HourGlass网络的人体关键点检测方法
CN109670397B (zh) * 2018-11-07 2020-10-30 北京达佳互联信息技术有限公司 人体骨骼关键点的检测方法、装置、电子设备及存储介质
CN113569798B (zh) * 2018-11-16 2024-05-24 北京市商汤科技开发有限公司 关键点检测方法及装置、电子设备和存储介质
CN110705365A (zh) * 2019-09-06 2020-01-17 北京达佳互联信息技术有限公司 一种人体关键点检测方法、装置、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120114175A1 (en) * 2010-11-05 2012-05-10 Samsung Electronics Co., Ltd. Object pose recognition apparatus and object pose recognition method using the same
CN109726659A (zh) * 2018-12-21 2019-05-07 北京达佳互联信息技术有限公司 人体骨骼关键点的检测方法、装置、电子设备和可读介质
CN110245655A (zh) * 2019-05-10 2019-09-17 天津大学 一种基于轻量级图像金字塔网络的单阶段物体检测方法
CN110263756A (zh) * 2019-06-28 2019-09-20 东北大学 一种基于联合多任务学习的人脸超分辨率重建系统
CN111414823A (zh) * 2020-03-12 2020-07-14 Oppo广东移动通信有限公司 人体特征点的检测方法、装置、电子设备以及存储介质

Also Published As

Publication number Publication date
CN111414823A (zh) 2020-07-14
CN111414823B (zh) 2023-09-12

Similar Documents

Publication Publication Date Title
CN109493350B (zh) 人像分割方法及装置
CN110473141B (zh) 图像处理方法、装置、存储介质及电子设备
WO2021169723A1 (fr) Procédé et appareil de reconnaissance d'image, dispositif électronique et support de stockage
CN108470320B (zh) 一种基于cnn的图像风格化方法及系统
CN110532984B (zh) 关键点检测方法、手势识别方法、装置及系统
CN109241880B (zh) 图像处理方法、图像处理装置、计算机可读存储介质
WO2021073493A1 (fr) Procédé et dispositif de traitement d'image, procédé d'apprentissage de réseau neuronal, procédé de traitement d'image de modèle de réseau neuronal combiné, procédé de construction de modèle de réseau neuronal combiné, processeur de réseau neuronal et support d'informations
US11151361B2 (en) Dynamic emotion recognition in unconstrained scenarios
US9633282B2 (en) Cross-trained convolutional neural networks using multimodal images
WO2020199478A1 (fr) Procédé d'entraînement de modèle de génération d'images, procédé, dispositif et appareil de génération d'images, et support de stockage
CN112651438A (zh) 多类别图像的分类方法、装置、终端设备和存储介质
CN111104962A (zh) 图像的语义分割方法、装置、电子设备及可读存储介质
WO2020015752A1 (fr) Procédé, appareil et système d'identification d'attribut d'objet et dispositif informatique
US20230085605A1 (en) Face image processing method, apparatus, device, and storage medium
CN112990219B (zh) 用于图像语义分割的方法和装置
CN110415250B (zh) 一种基于深度学习的重叠染色体分割方法及装置
WO2018082308A1 (fr) Procédé de traitement d'image et terminal
WO2021179822A1 (fr) Procédé et appareil de détection de points caractéristiques du corps humain, dispositif électronique et support de stockage
CN113011253B (zh) 基于ResNeXt网络的人脸表情识别方法、装置、设备及存储介质
CN112927209B (zh) 一种基于cnn的显著性检测系统和方法
CN110807362A (zh) 一种图像检测方法、装置和计算机可读存储介质
CN110958469A (zh) 视频处理方法、装置、电子设备及存储介质
CN112308866A (zh) 图像处理方法、装置、电子设备及存储介质
CN111292334B (zh) 一种全景图像分割方法、装置及电子设备
CN112381061A (zh) 一种面部表情识别方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21767556

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21767556

Country of ref document: EP

Kind code of ref document: A1