CN109598234B - Key point detection method and device - Google Patents

Key point detection method and device Download PDF

Info

Publication number
CN109598234B
CN109598234B CN201811474069.2A CN201811474069A CN109598234B CN 109598234 B CN109598234 B CN 109598234B CN 201811474069 A CN201811474069 A CN 201811474069A CN 109598234 B CN109598234 B CN 109598234B
Authority
CN
China
Prior art keywords
feature
detection
human body
network
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811474069.2A
Other languages
Chinese (zh)
Other versions
CN109598234A (en
Inventor
杨思远
曲晓超
姜浩
闫帅
张伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Meitu Innovation Technology Co ltd
Original Assignee
Shenzhen Meitu Innovation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Meitu Innovation Technology Co ltd filed Critical Shenzhen Meitu Innovation Technology Co ltd
Priority to CN201811474069.2A priority Critical patent/CN109598234B/en
Publication of CN109598234A publication Critical patent/CN109598234A/en
Application granted granted Critical
Publication of CN109598234B publication Critical patent/CN109598234B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a key point detection method and a key point detection device, wherein the key point detection method comprises the steps of taking a current frame image in video information as the input of a human body detector, and calculating and outputting a human body detection frame vector for cutting the current frame image and a posture probability value in the current frame image; clipping the current frame image according to the human body detection frame to obtain a human body image block; and taking the posture probability value and the human body image block as the input of a feature detector to calculate and output key points in the current frame image. The invention can effectively solve the problem that human body characteristic detection is difficult to execute in real time at the mobile terminal, reduce the network complexity in the key point detection process and provide detection precision.

Description

Key point detection method and device
Technical Field
The invention relates to the technical field of image processing, in particular to a method and a device for detecting key points.
Background
The existing human body key point detection method based on deep learning mainly comprises two model architecture design modes of top-down and bottom-up. The top-down mode generally adopts a human body detection network to obtain a detection frame of a person, and then adopts a feature detection network to obtain key points of each limb of the person in the frame; and the bottom-up mode firstly detects all the key points of the limbs in the image and then connects the points into different people through a certain connection rule. However, because human postures are rich and varied and are easily shielded by background objects, self clothes and the like, a relatively large neural network is often needed to complete a limb detection task no matter in a mode from low to upward or from top to downward, and once the network does not have enough expression capacity, all complex scenes are difficult to take into account, so that the data processing speed of the human key point detection method based on deep learning is poor, and the method is difficult to apply to real-time scenes, especially mobile terminals.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for detecting a key point, which can effectively solve the above problems.
In order to achieve the above object, a preferred embodiment of the present invention provides a method for detecting a key point, which is applied to a mobile terminal, where the method for detecting a key point includes a feature detection process, and the feature detection process includes:
taking a current frame image in video information as the input of a human body detector to calculate and output a human body detection frame for clipping the current frame image and an attitude probability value in the current frame image;
clipping the current frame image according to the human body detection frame to obtain a human body image block;
and taking the posture probability value and the human body image block as input of a feature detector, so that the feature detector selects a feature detection network matched with the posture probability value to calculate and output key points in the current frame image.
In an option of a preferred embodiment of the present invention, the human body detector comprises a first feature extraction network, a regional suggestion network and a classification regression network; calculating and outputting a human body detection frame for clipping the current frame image and a posture probability value in the current frame image, wherein the steps comprise:
taking the current frame image as the input of the first feature extraction network to extract and output image features in the current frame image;
taking the extracted image features as input of the regional suggestion network to generate an initial detection frame, and cutting the image features in the current frame image according to the initial detection frame to obtain an initial image feature block;
and taking the initial image feature block as the input of the classification regression network to calculate a posture probability value for representing the posture category of the human body, and carrying out fine correction on the initial detection frame to obtain the human body detection frame.
In an alternative preferred embodiment of the invention, the feature detector comprises a second feature extraction network and a plurality of feature detection networks; the step of taking the posture probability value and the human body image block as the input of a feature detector to calculate and output key points in the current frame image comprises the following steps:
taking the human body image block as the input of the second feature extraction network to calculate and extract the human body features in the human body image block;
and selecting a corresponding feature detection network from the plurality of feature detection networks according to the posture probability value, and using the human body feature as a key point of the target detection network to detect the human body feature.
In a preferred embodiment of the present invention, before the step of executing the posture probability value as an input of the feature detector to select a feature detection network from the plurality of feature detection networks, which most matches the posture probability value, the method further comprises:
acquiring a training data set, and dividing the training data set into a plurality of training subsets, wherein the training subsets correspond to the feature detection networks one to one;
aiming at each training subset, taking the training subset as the input of a corresponding feature detection network to calculate and output the test feature points of the training subset, and taking the training subset as the input of a regression network to calculate and output a test tracking value;
and calculating a loss function value of the feature detection network according to the test feature point and the test tracking value, and optimizing the feature detection network according to the loss function value until the output of the loss function value meets a preset requirement.
In an alternative preferred embodiment of the present invention, the step of calculating the Loss function value Loss comprises:
Figure BDA0001891745710000031
wherein o iscRepresenting test feature points; delta XcRepresenting the test trace value, HcRepresenting the actual characteristic point, δ YcRepresenting actual tracking values, C representing the number of feature detection networksQuantity, c represents the c-th training subset.
In a selection of a preferred embodiment of the present invention, the method for detecting a keypoint further includes a feature tracking process, where the feature tracking process includes:
and the human body detection frame is used as the input of the detection regression network so as to carry out fine correction on the human body detection frame, and the human body is tracked based on the corrected human body detection frame.
In the selection of the preferred embodiment of the present invention, a first thread and a second thread run in the mobile terminal;
the first thread is configured to execute the feature detection process, and the second thread is configured to execute the feature tracking process based on an operation result of the first thread, where the first thread and the second thread alternately operate according to a preset cycle.
The preferred embodiment of the present invention further provides a key point detecting device, which is applied to a mobile terminal, and the key point detecting device includes:
the gesture probability calculation module is used for taking the current frame image in the video information as the input of the human body detector so as to calculate and output a human body detection frame for cutting the current frame image and a gesture probability value in the current frame image;
the image cutting module is used for cutting the current frame image according to the human body detection frame to obtain a human body image block;
and the key point extraction module is used for taking the posture probability value and the human body image block as the input of the feature detector, so that the feature detector selects a feature detection network matched with the posture probability value to calculate and output the key points in the current frame image.
In an option of a preferred embodiment of the present invention, the human body detector comprises a first feature extraction network, a regional suggestion network and a classification regression network; the attitude probability calculation module comprises;
a first feature extraction unit, configured to use the current frame image as an input of the first feature extraction network to extract and output image features in the current frame image;
the image cutting unit is used for taking the extracted image features as the input of the regional suggestion network to generate an initial detection frame, and cutting the image features in the current frame image according to the initial detection frame to obtain an initial image feature block;
and the posture probability calculation unit is used for taking the initial image feature block as the input of the classification regression network so as to calculate a posture probability value for representing the human posture category, and finely correcting the initial detection frame to obtain the human detection frame.
In an alternative embodiment of the present invention, the feature detector includes a second feature extraction network and a plurality of feature detection networks, and the key point extraction module includes:
the second feature extraction unit is used for taking the human body image block as the input of the second feature extraction network so as to calculate and extract human body features in the human body image block;
and the key point detection unit is used for selecting a corresponding feature detection network from the plurality of feature detection networks according to the posture probability value, using the feature detection network as a target detection network, and using the human body feature as the input of the target detection network to detect the key point of the human body feature.
Compared with the prior art, the embodiment of the invention provides a key point detection method and a key point detection device, wherein the adopted feature detector is composed of a plurality of small network models which respectively process one gesture, so that the network training difficulty of the detection model is effectively reduced, the data processing speed is improved, and each small network can realize higher precision on the corresponding gesture. Meanwhile, the invention can synchronously output the type of the human body posture while detecting the human body, and can select a proper characteristic detection network to detect key points according to the type.
In addition, the invention adopts parallel detection logic to further improve the running speed when the key point detection is carried out
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic block structure diagram of a mobile terminal according to an embodiment of the present invention.
Fig. 2 is a schematic flow chart of a key point detection method according to an embodiment of the present invention.
Fig. 3 is a sub-flowchart of step S11 shown in fig. 2.
Fig. 4 is a sub-flowchart of step S13 shown in fig. 2.
Fig. 5 is a schematic diagram of a network structure of a feature detector according to an embodiment of the present invention.
Fig. 6 is another schematic flow chart of the keypoint detection method according to the embodiment of the present invention.
Fig. 7 is a functional block diagram of a keypoint detection apparatus according to an embodiment of the present invention.
Icon: 10-a mobile terminal; 100-key point detection means; 110-attitude probability calculation module; 1100-a first feature extraction unit; 1101-an image cropping unit; 1102-attitude probability calculation unit; 120-an image cropping module; 130-key point extraction module; 1300-a second feature extraction unit; 1301-a key point detection unit; 200-a memory; 300-a memory controller; 400-processor.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
As shown in fig. 1, which is a block diagram of a mobile terminal 10 according to an embodiment of the present invention, the mobile terminal 10 includes a key point detecting device 100, a memory 200, a storage controller 300, and a processor 400. Wherein, the memory 200, the memory controller 300 and the processor 400 are electrically connected to each other directly or indirectly to realize the data transmission or interaction. For example, the components are electrically connected to each other through one or more communication buses or signal lines. The key point detecting device 100 includes at least one software function module which may be stored in the memory 200 in the form of software or firmware or solidified in the operating system of the mobile terminal 10. The processor 400 accesses the memory 200 under the control of the memory controller 300, so as to execute executable modules stored in the memory 200, such as software functional modules and computer programs included in the keypoint detection apparatus 100, and further implement the keypoint detection method in the embodiment of the present invention.
It should be understood herein that the structure of the mobile terminal 10 shown in FIG. 1 is merely illustrative, as the mobile terminal 10 may have more or fewer components than shown in FIG. 1, or a different configuration than shown in FIG. 1. Wherein the components shown in fig. 1 may be implemented by software, hardware, or a combination thereof.
Fig. 2 is a schematic flow chart of a method for detecting a keypoint according to a preferred embodiment of the present invention, which is applied to the mobile terminal 10 shown in fig. 2. The detailed flow and steps of the keypoint detection method will be described in detail with reference to fig. 2. It should be noted that the actual implementation steps of the key point detection method in the present embodiment are not limited to the following sequence in fig. 2.
Step S11, using the current frame image in the video information as the input of the human body detector to calculate and output the human body detection frame for clipping the current frame image and the posture probability value in the current frame image;
step S12, clipping the current frame image according to the human body detection frame to obtain a human body image block;
and step S13, the posture probability value and the human body image block are used as the input of a feature detector, so that the feature detector selects a feature detection network matched with the posture probability value to calculate and output key points in the current frame image.
The human body feature detection method given in the above-mentioned step S11-step S13 is applied to the mobile terminal 10 to realize real-time detection of key points of the human body. Specifically, in order to give consideration to the operation speed and the precision of detectors (such as a human body detector, a feature detector and the like) in the human body key point detection process, the invention abandons the mode of detecting key points by applying a larger detection model in the prior art, and adopts a multi-branch feature detector to realize a key point detection network based on the design mode of a human body key point detection frame from top to bottom. The feature detector in the key point detection network comprises a plurality of feature detection networks, and each feature detection network is used for processing key point detection in one gesture, so that the data processing speed can be effectively increased, the training difficulty during feature detection network training is reduced, each small feature detection network can realize higher detection precision in the corresponding gesture, and meanwhile, the real-time detection of human key points on the mobile terminal 10 is ensured.
In detail, in step S11, the classification of the body postures is obtained by clustering, assuming that there are N body samples in the training set and L key points in each body, the classification is performedThe coordinates of the key points are normalized to the interval [0,1 ]]In between, the normalized coordinates of the l-th point on the nth person can be expressed as
Figure BDA0001891745710000081
Then, for each human body pose, the coordinates of all its points are assigned
Figure BDA0001891745710000082
And as the characteristic vector, clustering all the postures by adopting a hierarchical clustering algorithm, wherein the clustering number is C, and the adopted linking criterion is maximum linking, so that all the data in the training set can be divided into C classes, and the recommended value of C is 6.
Specifically, in practical implementation, the human body detector may include a first feature extraction Network, a Region suggestion Network (RPN), and a classification regression Network, and then as shown in fig. 3, step S11 may be implemented by steps S110 to S112 as follows.
Step S110, using the current frame image as the input of the first feature extraction network to extract and output the image features in the current frame image;
step S111, the extracted image features are used as input of the regional suggestion network to generate an initial detection frame, and image features in the current frame image are cut according to the initial detection frame to obtain an initial image feature block;
and step S112, taking the initial image feature block as the input of the classification regression network to calculate a posture probability value for representing the posture category of the human body, and performing fine correction on the initial detection frame to obtain a human body detection frame.
In steps S110 to S112, the first feature extraction network is configured to extract image features (such as human body image features) in the current frame image. The area suggestion network is used for generating a rough detection frame, namely an initial detection frame, according to the image features obtained by the first feature extraction network, and then cutting the image features in the current frame image according to the initial detection frame to obtain an initial image feature block. The classification regression network may be a fully-connected network, and is configured to calculate and output three vectors according to an output result of the regional suggestion network, where the first vector is a vector for refining and correcting an initial bounding box to output a relatively accurate human detection box, the second vector is a foreground (e.g., human body) probability vector and a background probability vector, which may have a length of 2, for classification of the foreground and the background, and the third vector is an attitude probability vector for selecting a matching feature detection network.
It should be noted that, the difference between the training method and the conventional fast-RCNN (fast Regions with relational Neural Network feature) is that the classification regression Network in the present invention adds an output of the posture probability value for human posture classification, but the training method of the whole Network of the human body detector provided by the present invention is the same as the conventional training method of the fast-RCNN, and the present embodiment is not repeated herein.
Further, in step S12, the shape, size, etc. of the human image block depends on the shape, size, etc. of the human detection frame, which is not limited herein. In practical implementation, it is assumed that the human body detection frame is a vector with a length of 4, such as [ x, y, w, h ], where x and y represent the coordinates of the upper left corner of the human body detection frame, and w and h represent the width and height of the frame, respectively. When the image characteristics are cut, rectangular areas at x to x + w abscissas and y to y + h ordinates of the current frame image can be extracted to finish cutting, and then the human body image block is obtained.
Further, in step S13, the feature detector may include a second feature extraction network and a plurality of feature detection networks, the plurality of feature detection networks share one second feature extraction network (i.e., a basic network), and each feature detection network is responsible for detecting the key points in one human body posture. In the present invention, by adopting a feature detection structure of a basic network (second feature extraction network) and a multi-branch network (feature detection network), the number of network parameters can be greatly reduced, so that the model of the feature detector is not too bulky in the mobile terminal 10, and the training difficulty of the network model can be greatly reduced. In detail, as shown in fig. 4, step S13 may be implemented by steps S130 to S131 described below.
Step S130, using the human body image block as the input of the second feature extraction network to calculate and extract the human body features in the human body image block;
step S131, selecting a corresponding feature detection network from the plurality of feature detection networks according to the posture probability value and using the feature detection network as a target detection network, and using the human body feature as the input of the target detection network to detect the key point of the human body feature.
Alternatively, in the present embodiment, the spatial information of the key points may be expressed in the form of thermodynamic diagrams. Assuming that the input size of the feature detection network is H × W, the coordinate of the l-th point is (x)l,yl) And the ratio of input to output is s, the thermodynamic diagram of a human body posture is one
Figure BDA0001891745710000101
The three-dimensional matrix H of (a), wherein,
Figure BDA0001891745710000102
z represents the third dimension of the three-dimensional matrix H, l represents the number of the human key points, and 0<l<L-1。
Further, based on the description of the above steps S10-S13, the method for detecting the keypoint may further include a feature tracking process, where the feature tracking process includes using the human body detection frame as an input of a tracking regression network to refine and correct the human body detection frame, and performing feature tracking based on the corrected human body detection frame.
It should be noted that, as shown in fig. 5, the present embodiment implements feature tracking by adding a detection regression network to the feature detector, and the detection regression network and a plurality of feature detection networks together form a plurality of branches of the feature detector so as to share the basic network of the second feature extraction network. In detail, the detection regression network is used for performing refining and correction on the human body detection frame of the current frame again, and realizing feature tracking based on the refined human body detection frame. In practical implementation, after the human body detector acquires the human body detection frame once, the branch of the detection regression network can process the displacement of the human body detection frame of a plurality of frames in the future, the pressure of the human body detector is released, and the running time is reduced.
As an embodiment, the present invention adopts a parallel manner to implement the foregoing feature detection process and feature tracking process. Specifically, a first thread and a second thread may run in the mobile terminal 10; the first thread is configured to execute the feature detection process, and the second thread is configured to execute the feature tracking process based on an operation result of the first thread, where the first thread and the second thread alternately operate according to a preset cycle. For example, the feature detection process may be performed when the camera of the mobile terminal 10 is turned on, and may be performed every fixed number of frames. The human body detector obtains a human body detection frame with the highest probability and a posture probability value, then the feature detector cuts out an interested region (such as a region where a human body is located) from a current frame image according to the human body detection frame, and selects a proper feature detection network according to the posture probability value to obtain a key point detection result and fine correction information of the human body detection frame. The feature tracking process can occur between two feature detection processes, in the feature detection process, a human body detector is in a dormant state for saving power consumption, only the feature detector is in a working state, at the moment, a detection regression network of the feature detector starts to play a role, after each frame of key point detection is completed, a human body detection frame is refined, the human body detection frame is ensured to be aligned with a human body of the frame of image, so that the accumulated error is reduced, and then the human body detection frame is used as a human body detection frame of the next frame. The characteristic detection process ensures that people can be updated in time when people appear and disappear, and the characteristic tracking process ensures that the human body detection thread is not fully loaded, so that the overall power consumption is reduced.
In this embodiment, the feature detection process and the feature tracking process are executed in parallel by two threads, so that the feature detector necessarily obtains the human body detection frame of the previous frame, but because the time interval is too short, the error of the human body detection frame can be ignored.
Further, according to actual requirements, as shown in fig. 6, before the step of executing the posture probability value as an input of the feature detector to select a feature detection network that best matches the posture probability value from the plurality of feature detection networks, the keypoint detection method may further train the feature detector through the following steps S14 to S16, as follows.
Step S14, acquiring a training data set, dividing the training data set into a plurality of training subsets, wherein the training subsets correspond to the feature detection networks one to one;
step S15, aiming at each training subset, using the training subset as the input of the corresponding feature detection network to calculate and output the test feature points of the training subset, and using the training subset as the input of the regression network to calculate and output the test tracking value;
and step S16, calculating a loss function value of the feature detection network according to the test feature point and the test tracking value, and optimizing the feature detection network according to the loss function value until the output of the loss function value meets the preset requirement.
In steps S14-S16, the preset requirement is that the loss function value is minimized or smoothed. In addition, assuming that the training data set is divided into C training subsets, and the training set is randomly clipped and subjected to displacement change, a batch (batch) of training data is taken out from the C training subsets in turn at each iteration and is respectively put into the feature detector to obtain the output of the corresponding feature detection network and the output of the detection regression network. Let I be the training image obtained from the c-th training subsetcThe standard thermodynamic diagram is HcThe difference between the original frame coordinate and the coordinate after random cutting and displacement is delta YcThe output of the c-th branch is OcThe output of the regression network is deltaXcThen the loss function is:
Figure BDA0001891745710000121
after the training of the feature detector is completed, the coordinates of each human keypoint can be obtained by the position of the maximum on each slice in the thermodynamic diagram.
Based on the foregoing description of the keypoint detection method, it can be seen that the present invention effectively solves the disadvantage that the keypoint detection technology based on deep learning in the prior art is difficult to be executed in real time on the mobile terminal 10. The problem of the speed of feature detection is solved by parallel execution (such as a feature detection process and a feature tracking process) and a miniaturized network; the problem of weak expression capacity of a small network is solved through human posture classification and a multi-branch small network (such as a plurality of feature detection networks); the problem of excessive power consumption in the execution of a plurality of models in the prior art is effectively alleviated through careful design of a plurality of small feature detection networks and a detection frame regression network.
Further, referring to fig. 7, an embodiment of the invention further provides a key point detecting device 100, which is applied to the mobile terminal 10 shown in fig. 1. The keypoint detection apparatus 100 includes a pose probability calculation module 110, an image cropping module 120, and a keypoint extraction module 130.
The pose probability calculation module 110 is configured to use a current frame image in the video information as an input of the human body detector, so as to calculate and output a human body detection frame for clipping the current frame image and a pose probability value in the current frame image; in this embodiment, the description of the posture probability calculation module 110 may refer to the detailed description of step S11, that is, step S11 may be executed by the posture probability calculation module 110. Optionally, the pose probability calculation module 110 includes a first feature extraction unit 1100, an image cropping unit 1101, and a pose probability calculation unit 1102.
The first feature extraction unit 1100 is configured to use the current frame image as an input of the first feature extraction network to extract and output image features in the current frame image; in this embodiment, the description of the first feature extraction unit 1100 may specifically refer to the detailed description of step S110, that is, step S110 may be executed by the first feature extraction unit 1100.
The image cropping unit 1101 is configured to use the extracted image features as an input of the regional suggestion network to generate an initial detection frame, and crop the image features in the current frame image according to the initial detection frame to obtain an initial image feature block; in the present embodiment, the detailed description of step S111 may be referred to specifically for the description of the image cropping unit 1101, that is, step S111 may be performed by the image cropping unit 1101.
The pose probability calculation unit 1102 is configured to use the initial image feature block as an input of the classification regression network to calculate a pose probability value for representing a human pose category, and perform refinement and correction on the initial detection frame to obtain a human detection frame. In this embodiment, the description of the posture probability calculation unit 1102 may specifically refer to the detailed description of step S112, that is, step S112 may be executed by the posture probability calculation unit 1102.
The image clipping module 120 is configured to clip the current frame image according to the human body detection frame to obtain a human body image block; in this embodiment, the detailed description of step S12 may be referred to specifically for the description of the image cropping module 120, that is, step S12 may be executed by the image cropping module 120.
The keypoint extraction module 130 is configured to use the posture probability value and the human body image block as input of a feature detector, so that the feature detector selects a feature detection network matched with the posture probability value to calculate and output keypoints in the current frame image. In this embodiment, the detailed description of the step S13 may be referred to for the description of the keypoint extraction module 130, that is, the step S13 may be executed by the keypoint extraction module 130. Optionally, the keypoint extraction module 130 may include a second feature extraction unit 1300 and a keypoint detection unit 1301.
The second feature extraction unit 1300 is configured to use the human body image block as an input of the second feature extraction network to calculate and extract human body features in the human body image block; in this embodiment, the detailed description of step S130 may be referred to for the description of the second feature extraction unit 1300, that is, step S130 may be performed by the second feature extraction unit 1300.
The key point detecting unit 1301 is configured to select a corresponding feature detection network from the plurality of feature detection networks according to the posture probability value, use the feature detection network as a target detection network, and use the human body feature as an input of the target detection network to detect a key point of the human body feature. In this embodiment, the detailed description of step S131 may be specifically referred to for the description of the key point detecting unit 1301, that is, step S131 may be executed by the key point detecting unit 1301.
In summary, embodiments of the present invention provide a method and an apparatus for detecting a keypoint, where a feature detector adopted in the present invention is composed of multiple small network models that respectively process a gesture, so as to effectively reduce the difficulty of network training of a detection model, improve the data processing speed, and enable each small network to achieve relatively high accuracy in its corresponding gesture. Meanwhile, the invention can synchronously output the type of the human body posture while detecting the human body, and can select a proper characteristic detection network to detect key points according to the type.
In the embodiments provided in the present invention, it should be understood that the disclosed system and method can be implemented in other ways. The system and method embodiments described above are merely illustrative, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A key point detection method is applied to a mobile terminal, and is characterized in that the key point detection method comprises a characteristic detection process, and the characteristic detection process comprises the following steps:
taking a current frame image in video information as the input of a human body detector to calculate and output a human body detection frame for clipping the current frame image and an attitude probability value in the current frame image;
clipping the current frame image according to the human body detection frame to obtain a human body image block;
the posture probability value and the human body image block are used as input of a feature detector, so that the feature detector selects a feature detection network matched with the posture probability value to calculate and output key points in the current frame image;
wherein the feature detector comprises a second feature extraction network and a plurality of feature detection networks; the step of taking the posture probability value and the human body image block as the input of a feature detector to calculate and output key points in the current frame image comprises the following steps:
taking the human body image block as the input of the second feature extraction network to calculate and extract the human body features in the human body image block;
and selecting a corresponding feature detection network from the plurality of feature detection networks according to the posture probability value, and using the human body feature as a key point of the target detection network to detect the human body feature.
2. The keypoint detection method of claim 1, wherein said human body detector comprises a first feature extraction network, a regional suggestion network and a classification regression network; calculating and outputting a human body detection frame for clipping the current frame image and a posture probability value in the current frame image, wherein the steps comprise:
taking the current frame image as the input of the first feature extraction network to extract and output image features in the current frame image;
taking the extracted image features as input of the regional suggestion network to generate an initial detection frame, and cutting the image features in the current frame image according to the initial detection frame to obtain an initial image feature block;
and taking the initial image feature block as the input of the classification regression network to calculate a posture probability value for representing the posture category of the human body, and carrying out fine correction on the initial detection frame to obtain the human body detection frame.
3. The keypoint detection method of claim 1, wherein before the step of performing the pose probability value as an input to a feature detector to select a feature detection network from a plurality of feature detection networks that best matches the pose probability value, the method further comprises:
acquiring a training data set, and dividing the training data set into a plurality of training subsets, wherein the training subsets correspond to the feature detection networks one to one;
aiming at each training subset, taking the training subset as the input of a corresponding feature detection network to calculate and output the test feature points of the training subset, and taking the training subset as the input of a regression network to calculate and output a test tracking value;
and calculating a loss function value of the feature detection network according to the test feature point and the test tracking value, and optimizing the feature detection network according to the loss function value until the output of the loss function value meets a preset requirement.
4. The method of claim 3, wherein the step of calculating the Loss function value Loss comprises:
Figure FDA0002459763640000031
wherein o iscRepresenting test feature points; delta XcRepresenting the test trace value, HcRepresenting the actual characteristic point, δ YcRepresenting the actual tracking value, C representing the number of feature detection networks, and C representing the C-th training subset.
5. The keypoint detection method of claim 1, further comprising a feature tracking process comprising:
and the human body detection frame is used as the input of the detection regression network so as to carry out fine correction on the human body detection frame, and the human body is tracked based on the corrected human body detection frame.
6. The method according to claim 5, wherein a first thread and a second thread run in the mobile terminal;
the first thread is configured to execute the feature detection process, and the second thread is configured to execute the feature tracking process based on an operation result of the first thread, where the first thread and the second thread alternately operate according to a preset cycle.
7. The utility model provides a key point detection device, is applied to mobile terminal which characterized in that, key point detection device includes:
the gesture probability calculation module is used for taking the current frame image in the video information as the input of the human body detector so as to calculate and output a human body detection frame for cutting the current frame image and a gesture probability value in the current frame image;
the image cutting module is used for cutting the current frame image according to the human body detection frame to obtain a human body image block;
the key point extraction module is used for taking the posture probability value and the human body image block as the input of a feature detector, so that the feature detector selects a feature detection network matched with the posture probability value to calculate and output key points in the current frame image;
wherein the feature detector comprises a second feature extraction network and a plurality of feature detection networks, and the keypoint extraction module comprises:
the second feature extraction unit is used for taking the human body image block as the input of the second feature extraction network so as to calculate and extract human body features in the human body image block;
and the key point detection unit is used for selecting a corresponding feature detection network from the plurality of feature detection networks according to the posture probability value, using the feature detection network as a target detection network, and using the human body feature as the input of the target detection network to detect the key point of the human body feature.
8. The keypoint detection device of claim 7, wherein the human body detector comprises a first feature extraction network, a regional suggestion network and a classification regression network; the attitude probability calculation module comprises;
a first feature extraction unit, configured to use the current frame image as an input of the first feature extraction network to extract and output image features in the current frame image;
the image cutting unit is used for taking the extracted image features as the input of the regional suggestion network to generate an initial detection frame, and cutting the image features in the current frame image according to the initial detection frame to obtain an initial image feature block;
and the posture probability calculation unit is used for taking the initial image feature block as the input of the classification regression network so as to calculate a posture probability value for representing the human posture category, and finely correcting the initial detection frame to obtain the human detection frame.
CN201811474069.2A 2018-12-04 2018-12-04 Key point detection method and device Active CN109598234B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811474069.2A CN109598234B (en) 2018-12-04 2018-12-04 Key point detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811474069.2A CN109598234B (en) 2018-12-04 2018-12-04 Key point detection method and device

Publications (2)

Publication Number Publication Date
CN109598234A CN109598234A (en) 2019-04-09
CN109598234B true CN109598234B (en) 2021-03-23

Family

ID=65960934

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811474069.2A Active CN109598234B (en) 2018-12-04 2018-12-04 Key point detection method and device

Country Status (1)

Country Link
CN (1) CN109598234B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016371B (en) * 2019-05-31 2022-01-14 广州市百果园信息技术有限公司 Face key point detection method, device, equipment and storage medium
CN110634160B (en) * 2019-08-12 2022-11-18 西北大学 Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph
CN110765898B (en) * 2019-10-09 2022-11-22 东软睿驰汽车技术(沈阳)有限公司 Method and device for determining object and key point thereof in image
CN112699265B (en) * 2019-10-22 2024-07-19 商汤国际私人有限公司 Image processing method and device, processor and storage medium
CN110969138A (en) * 2019-12-10 2020-04-07 上海芯翌智能科技有限公司 Human body posture estimation method and device
CN112585944A (en) * 2020-01-21 2021-03-30 深圳市大疆创新科技有限公司 Following method, movable platform, apparatus and storage medium
CN111291692B (en) * 2020-02-17 2023-10-20 咪咕文化科技有限公司 Video scene recognition method and device, electronic equipment and storage medium
CN111325179B (en) * 2020-03-09 2023-05-02 厦门美图之家科技有限公司 Gesture tracking method, gesture tracking device, electronic equipment and storage medium
CN113538573B (en) * 2020-04-20 2023-07-25 中移(成都)信息通信科技有限公司 Dress key point positioning method, device, electronic equipment and computer storage medium
CN111599007B (en) * 2020-05-26 2021-05-25 王梅莹 Smart city CIM road mapping method based on unmanned aerial vehicle aerial photography
CN112200183A (en) * 2020-09-30 2021-01-08 北京字节跳动网络技术有限公司 Image processing method, device, equipment and computer readable medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8861864B2 (en) * 2010-03-11 2014-10-14 Qualcomm Incorporated Image feature detection based on application of multiple feature detectors
US9367897B1 (en) * 2014-12-11 2016-06-14 Sharp Laboratories Of America, Inc. System for video super resolution using semantic components
CN107944442B (en) * 2017-11-09 2019-08-13 北京智芯原动科技有限公司 Based on the object test equipment and method for improving convolutional neural networks
CN108520251A (en) * 2018-04-20 2018-09-11 北京市商汤科技开发有限公司 Critical point detection method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109598234A (en) 2019-04-09

Similar Documents

Publication Publication Date Title
CN109598234B (en) Key point detection method and device
Rao et al. Deep convolutional neural networks for sign language recognition
US10706334B2 (en) Type prediction method, apparatus and electronic device for recognizing an object in an image
US10318797B2 (en) Image processing apparatus and image processing method
US9495758B2 (en) Device and method for recognizing gesture based on direction of gesture
CN105144239A (en) Image processing device, program, and image processing method
WO2016110030A1 (en) Retrieval system and method for face image
CN111260688A (en) Twin double-path target tracking method
CN107871099A (en) Face detection method and apparatus
Huang et al. Deepfinger: A cascade convolutional neuron network approach to finger key point detection in egocentric vision with mobile camera
CN104049760B (en) The acquisition methods and system of a kind of man-machine interaction order
CN102880862A (en) Method and system for identifying human facial expression
US20190066311A1 (en) Object tracking
Zakaria et al. Hierarchical skin-adaboost-neural network (h-skann) for multi-face detection
Raut Facial emotion recognition using machine learning
CN109087337B (en) Long-time target tracking method and system based on hierarchical convolution characteristics
CN110598638A (en) Model training method, face gender prediction method, device and storage medium
CN109033955A (en) A kind of face tracking method and system
An et al. Online RGB-D tracking via detection-learning-segmentation
Ramirez-Alonso et al. Temporal weighted learning model for background estimation with an automatic re-initialization stage and adaptive parameters update
Wang et al. Recurrent convolutional shape regression
US20170069112A1 (en) Image Characteristic Estimation Method and Device
Agha et al. A comprehensive study on sign languages recognition systems using (SVM, KNN, CNN and ANN)
Juang et al. Stereo-camera-based object detection using fuzzy color histograms and a fuzzy classifier with depth and shape estimations
CN111316283B (en) Gesture recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant