CN109299669A - Video human face critical point detection method and device based on double intelligent bodies - Google Patents
Video human face critical point detection method and device based on double intelligent bodies Download PDFInfo
- Publication number
- CN109299669A CN109299669A CN201811007365.1A CN201811007365A CN109299669A CN 109299669 A CN109299669 A CN 109299669A CN 201811007365 A CN201811007365 A CN 201811007365A CN 109299669 A CN109299669 A CN 109299669A
- Authority
- CN
- China
- Prior art keywords
- detection
- agent
- key point
- tracking
- point detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 167
- 238000012549 training Methods 0.000 claims abstract description 42
- 238000009826 distribution Methods 0.000 claims abstract description 40
- 230000006870 function Effects 0.000 claims abstract description 37
- 238000004891 communication Methods 0.000 claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 21
- 230000003993 interaction Effects 0.000 claims abstract description 9
- 238000005457 optimization Methods 0.000 claims abstract description 5
- 230000009471 action Effects 0.000 claims description 29
- 230000009977 dual effect Effects 0.000 claims description 20
- 230000002787 reinforcement Effects 0.000 claims description 16
- 230000015654 memory Effects 0.000 claims description 12
- 239000010410 layer Substances 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 9
- 239000002356 single layer Substances 0.000 claims description 7
- 230000002452 interceptive effect Effects 0.000 abstract description 7
- 230000001737 promoting effect Effects 0.000 abstract 1
- 230000036544 posture Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The video human face critical point detection method and device based on double intelligent bodies that the invention discloses a kind of, wherein method includes: to establish tracking intelligent body and critical point detection intelligent body respectively, and be connected by communication information channel;The edge distribution probability and conditional probability distribution of output tracking intelligent body and critical point detection intelligent body are distinguished according to Bayesian model;Markovian decision model is established according to edge distribution probability and conditional probability distribution, wherein, intelligent body and critical point detection intelligent body are tracked to act by the sequence of variable length, while updating the position of detection block and key point, and interaction transmitting information, to obtain testing result;And by establishing supervised learning training function and intensified learning training function optimization testing result, to obtain final result.This method exports the detection of face frame and the testing result of key point by interactive mode, has the advantages of performance for promoting detection system, optimizing detection result.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a video face key point detection method and device based on double intelligent agents.
Background
In the prior art, video face key point detection is widely concerned in the field of computer vision based on the rapid development of face key detection in the field of images. In practical application, a video scene better meets the actual requirement, and not only can the video provide more frames of human faces than a single image, but also the video has time dimension information, so that the method is helpful for key point positioning and subsequent human face identification and living body detection. The purpose of video face key point detection is as follows: a section of face video is given, and a series of key points such as face parts, face outlines and the like are detected for all frames of the video. In practical application, because the acquired face video is in a non-limited environment, besides the problems of large posture, expression change, severe shielding and the like existing in a static image, the detection of key points of the face becomes more difficult due to illumination change, motion blur and the like in the video.
During the past decades, there have been many research methods for video face key point detection. Since it is very difficult to directly perform the key point detection from the whole image of the video frame without any prior, most methods make a frame-by-frame detection strategy, which adopts a serial mode to process the video face key point problem. Specifically, the methods firstly generate a face detection frame with high confidence for each frame of image, and then perform key point detection on the face image region framed by the detection frame. Although this strategy reduces the difficulty of keypoint detection by introducing a priori of detection frames, the accuracy of keypoint detection thus obtained depends largely on the generated detection frames. Fig. 1 shows the effect of the face detection box on the keypoint detection. It can be found that the accuracy of the detection of the key points is greatly affected by the slight deviation of the detection frame. The phenomenon is caused because the generation of the detection frame does not consider any pose and expression information of the face, and especially when the face is in an extreme condition, the face area covered by the detection frame often cannot contain all face key points, and finally the detection effect of the face key points is limited. Therefore, the detection of the key points of the face of the video needs to fully utilize the interactive information of the face detection frame and the key points to ensure the precision. Because the face key points can effectively represent the motion of the face across postures, the key points can provide additional useful information for the generation of an accurate detection frame. However, most of the existing video face key point detection methods ignore the mutual information between the two methods, and result in low precision under extreme conditions.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, one objective of the present invention is to provide a method for detecting video face key points based on dual agents, which has the advantages of improving the performance of a detection system and optimizing the detection result.
The invention also aims to provide a video human face key point detection device based on double agents.
In order to achieve the above object, an embodiment of the present invention provides a method for detecting video face key points based on dual agents, including the following steps: respectively establishing a tracking intelligent agent and a key point detection intelligent agent, and connecting the tracking intelligent agent and the key point detection intelligent agent through a communication information channel; respectively outputting the edge distribution probability of the tracking intelligent agent and the key point detection intelligent agent according to a Bayesian model, and respectively acquiring conditional probability distribution according to communication information between the tracking intelligent agent and the key point detection intelligent agent; establishing a Markov decision model according to the edge distribution probability and the conditional probability distribution, wherein the tracking agent and the key point detection agent simultaneously update the positions of a detection frame and a key point through variable-length sequence actions and interactively transmit information to obtain a detection result; and optimizing the detection result by establishing a supervised learning training function and a reinforcement learning training function to obtain a final result.
According to the video face key point detection method based on the double agents, the tracking agent and the key point detection agent are respectively established, the video face key point detection is analyzed in a probability mode according to the Bayesian model, the positions of the detection frame and the key point are simultaneously updated through the Markov decision model, and the detection result of the face frame and the detection result of the key point are output in an interactive mode.
In addition, the video face key point detection method based on the double agents according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the establishing a tracking agent and a key point detecting agent respectively, and connected by a communication channel, further includes: the tracking intelligent agent is established based on a VGG-M model, a single-layer Q network is accessed, the key point detection intelligent agent is established through the combination of a cascade hourglass network and a confidence coefficient network, and the tracking intelligent agent is connected through two communication information channels based on a deconvolution layer and long-short term memory unit codes.
Further, in an embodiment of the present invention, the establishing a markov decision model according to the edge distribution probability and the conditional probability distribution, the tracking agent and the key point detecting agent performing variable-length sequence actions, updating positions of a detection frame and a key point, and interactively transmitting information to obtain a detection result, further includes: the tracking agent changes the currently observed region by a movement action, wherein the movement action comprises a left, right, up, down, zoom in, and zoom out; the key point detection agent decides whether the iteration stops by generating a stop or continue action.
Further, in an embodiment of the present invention, the normalized detection is adopted to obtain the coordinates of the key points as a representative of the three-dimensional posture information, and the long-short term memory unit LSTM is used to memorize the posture change of the time dimension.
Further, in an embodiment of the present invention, the supervised learning training function and the reinforcement learning training function further include:
the supervised learning training function is:
wherein,
the reinforcement learning training function is as follows:
wherein,
in order to achieve the above object, an embodiment of the present invention provides a video face key point detecting device based on dual agents, including: the establishing module is used for respectively establishing a tracking intelligent agent and a key point detection intelligent agent and is connected through a communication information channel; the probability distribution acquisition module is used for respectively outputting the edge distribution probability of the tracking intelligent agent and the key point detection intelligent agent according to a Bayesian model and respectively acquiring conditional probability distribution according to communication information between the tracking intelligent agent and the key point detection intelligent agent; the detection interaction module is used for establishing a Markov decision model according to the edge distribution probability and the conditional probability distribution, wherein the tracking intelligent agent and the key point detection intelligent agent perform actions through variable-length sequences, meanwhile, the positions of a detection frame and key points are updated, and information is interactively transmitted to obtain a detection result; and the optimization module is used for optimizing the detection result by establishing a supervised learning training function and an intensified learning training function so as to obtain a final result.
The video face key point detection device based on the double agents respectively establishes the tracking agent and the key point detection agent, analyzes video face key point detection in a probability mode according to the Bayesian model, updates the positions of the detection frame and the key point simultaneously through the Markov decision model, and outputs the detection result of the face frame and the detection result of the key point in an interactive mode.
In addition, the video face key point detection device based on the dual agents according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the establishing module further includes: the tracking intelligent agent is established based on a VGG-M model, a single-layer Q network is accessed, the key point detection intelligent agent is established through the combination of a cascade hourglass network and a confidence coefficient network, and the tracking intelligent agent is connected through two communication information channels based on a deconvolution layer and long-short term memory unit codes.
Further, in an embodiment of the present invention, the detecting interaction module further includes: the tracking agent changes the currently observed region by a movement action, wherein the movement action comprises a left, right, up, down, zoom in, and zoom out; the key point detection agent decides whether the iteration stops by generating a stop or continue action.
Further, in an embodiment of the present invention, the normalized detection is adopted to obtain the coordinates of the key points as a representative of the three-dimensional posture information, and the long-short term memory unit LSTM is used to memorize the posture change of the time dimension.
Further, in an embodiment of the present invention, the supervised learning training function and the reinforcement learning training function further include:
the supervised learning training function is:
wherein,
the reinforcement learning training function is as follows:
wherein,
additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic diagram illustrating the effect of a face detection box on keypoint detection;
FIG. 2 is a flow chart of a method for detecting key points of a video face based on dual agents according to an embodiment of the invention;
fig. 3 is a schematic diagram illustrating the interactive output of the detection of the face frame and the detection of the key points in the video face key point detection method based on the dual agents according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of partial results on a subset of challenges in the public face database 300VW of a dual-agent based video face keypoint detection method according to one embodiment of the present invention; and
fig. 5 is a schematic structural diagram of a video face key point detection apparatus based on dual agents according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes a method and an apparatus for detecting video face key points based on dual agents according to an embodiment of the present invention with reference to the accompanying drawings, and first, a method for detecting video face key points based on dual agents according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Fig. 2 is a flowchart of a method for detecting key points of a video face based on dual agents according to an embodiment of the present invention.
As shown in fig. 2, the method for detecting video face key points based on dual agents includes the following steps:
in step S101, a tracking agent and a key point detection agent are respectively established and connected via a communication channel.
Specifically, a tracking intelligent agent is established based on a VGG-M model, a single-layer Q network is accessed, a key point detection intelligent agent is established through the combination of a cascade hourglass network and a confidence coefficient network, and the tracking intelligent agent is connected through two communication information channels based on a deconvolution layer and long-short term memory unit codes.
In one embodiment of the invention, the network structure comprises three parts: tracking intelligent structure, detecting intelligent structure and communication information channel by key point. The tracking intelligent agent structure is based on a VGG-M model, a single-layer Q network is accessed behind the tracking intelligent agent structure, and the key point detection intelligent agent structure is designed to be the combination of a cascade hourglass network and a confidence coefficient network. The two communication channels are encoded by the deconvolution layer and the Long Short Term Memory (LSTM) unit, respectively.
The communication information between two agents explicitly encodes the synergy between the two agents. The information passed from the tracking agent to the keypoint detection agent is intended to provide a priori additional texture information for the keypoint detection agent to improve the robustness of the keypoint detection. We chose as information the profile of the third convolutional layer conv3 that tracks the agent and paralleled it with the first order network of the hourglass network for keypoint detection in the depth dimension. Because the feature graph selected by the tracking agent and the feature graph corresponding to the first-stage network of the key point detection agent are different in size, the feature graph and the feature graph can be input into a subsequent network in parallel only by unifying the scales of the feature graph and the feature graph.
The embodiment adopts deconvolution operation to enlarge the scale of the feature map to meet the requirement, and meanwhile, the deconvolution layer contains learnable parameters, so that the transmitted information can be encoded more appropriately through training.
The information transferred from the key point detection agent to the tracking agent provides additional three-dimensional attitude information for detection frame tracking, and aims to provide prior knowledge of the face attitude for accurate frame tracking. To achieve this, the present embodiment uses the detected coordinates of the key points after normalization as a representative of three-dimensional pose information, and uses the long-short term memory unit LSTM to memorize the pose changes in the time dimension. For stable training, the middle layer of the LSTM is updated when a markov decision terminates.
In step S102, edge distribution probabilities of the tracking agent and the key point detecting agent are respectively output according to the bayesian model, and conditional probability distributions are respectively obtained according to communication information between the tracking agent and the key point detecting agent.
In one embodiment of the invention, the video face keypoint detection problem is analyzed in a probabilistic manner according to a Bayesian model. The final output of the video face key points can be regarded as a joint probability distribution, as follows:
p(Bk,Vk|Ik,Bk-1,Vk-1)=p(Bk|Ik,Bk-1,Vk-l)p(Vk|Bk,Ik,Bk-1,Vk-1),
wherein,
according to bayes' theorem, the joint probability distribution can be expressed in two ways, namely:
p(Bk|Ik)p(Vk|Bk,Ik)=p(Vk|Ik)p(Bk|Vk,Ik)
wherein,
in this embodiment, two agents, namely, the tracking agent and the key point detecting agent, are defined, and the marginal probability distribution in the above formula is output respectively, and the other two conditional probability distributions are represented by communication information between the two agents, so that the above equation constraint is ensured in an explicit manner through interaction between the two agents.
In step S103, a markov decision model is established according to the edge distribution probability and the conditional probability distribution, wherein the agent and the agent for detecting the key point are tracked to move through a variable length sequence, and the positions of the detection frame and the key point are updated at the same time, and information is interactively transmitted to obtain a detection result.
In particular, the tracking agent changes the currently observed region by a movement action, wherein the movement action comprises a left, right, up, down, zoom in and zoom out; the key point detection agent decides whether the iteration stops by generating a stop or continue action.
In one embodiment of the invention, the video face keypoint detection problem is modeled as a Markov decision process, and the following explains the key definitions in the Markov decision process:
the state is as follows: the face image blocks framed by the cutting detection frame are obtained by:
st=φ(B,I),
wherein,
the actions are as follows: for a tracking agent, the tracking agent generates movement actions to change the currently observed region, specifically, movement actions are defined as left, right, up, down, zoom in, zoom out.
For a keypoint detection agent, the keypoint detection generates a stop/continue action to decide whether the iteration should stop.
Rewarding: in the case of a tracking agent, it is,
wherein,
for a key point detection agent to be,
wherein,
in step S104, the detection result is optimized by establishing a supervised learning training function and an reinforcement learning training function to obtain a final result.
In one embodiment of the invention, a two-stage training method is adopted during training: supervision training and reinforcement learning training.
Wherein, the supervised learning training objective function is as follows:
wherein,
the reinforcement learning training objective function is:
wherein,
as shown in fig. 3, the effect is good as seen from the partial results on the challenged subset in the public face database 300 VW.
According to the video face key point detection method based on the double agents, the tracking agent and the key point detection agent are respectively established, the video face key point detection is analyzed in a probability mode according to the Bayesian model, the positions of the detection frame and the key point are simultaneously updated through the Markov decision model, and the detection result of the face frame and the detection result of the key point are output in an interactive mode.
Fig. 5 is a schematic structural diagram of a dual-agent-based video face keypoint detection apparatus according to an embodiment of the present invention.
As shown in fig. 5, the dual agent-based video face keypoint detection apparatus 10 includes: the system comprises a building module 100, a probability distribution obtaining module 200, a detection interaction module 300 and an optimization module 400.
The establishing module 100 is used for respectively establishing a tracking agent and a key point detection agent, and is connected through a communication information channel, and the probability distribution obtaining module 200 is used for respectively outputting the edge distribution probability of the tracking agent and the key point detection agent according to a Bayesian model, and respectively obtaining the conditional probability distribution according to the communication information between the tracking agent and the key point detection agent. The detection interaction module 300 is configured to establish a markov decision model according to the edge distribution probability and the conditional probability distribution, wherein the intelligent agent and the key point detection intelligent agent are tracked to perform variable-length sequence actions, and meanwhile, the positions of the detection frame and the key point are updated, and information is interactively transmitted to obtain a detection result. The optimization module 400 is configured to optimize the detection result by establishing a supervised learning training function and a reinforcement learning training function to obtain a final result. The video face key point detection side device based on the double intelligent agents has the advantages of improving the performance of a detection system and optimizing a detection result.
Further, in an embodiment of the present invention, the establishing module 100 further includes: a tracking intelligent body is established based on a VGG-M model, a single-layer Q network is accessed, a key point detection intelligent body is established by combining a cascade hourglass network and a confidence coefficient network, and the tracking intelligent body is connected through two communication information channels based on a deconvolution layer and long-short term memory unit codes.
Further, in an embodiment of the present invention, the module for detecting interaction 200 further comprises: tracking the agent to change the currently observed region by a movement action, wherein the movement action comprises moving left, right, up, down, zooming in and zooming out; the key point detection agent decides whether the iteration stops by generating a stop or continue action.
Further, in an embodiment of the present invention, the normalized detection is adopted to obtain the coordinates of the key points as a representative of the three-dimensional posture information, and the long-short term memory unit LSTM is used to memorize the posture change of the time dimension.
Further, in an embodiment of the present invention, the supervised learning training function and the reinforcement learning training function further comprise:
the supervised learning training function is:
wherein,
the reinforcement learning training function is:
wherein,
it should be noted that the explanation of the embodiment of the method for detecting key points of a video face based on dual agents is also applicable to the apparatus for detecting key points of a video face based on dual agents in this embodiment, and is not repeated here.
The video face key point detection device based on the double agents respectively establishes the tracking agent and the key point detection agent, analyzes video face key point detection in a probability mode according to the Bayesian model, updates the positions of the detection frame and the key point simultaneously through the Markov decision model, and outputs the detection result of the face frame and the detection result of the key point in an interactive mode.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
Claims (10)
1. A video face key point detection method based on double agents is characterized by comprising the following steps:
respectively establishing a tracking intelligent agent and a key point detection intelligent agent, and connecting the tracking intelligent agent and the key point detection intelligent agent through a communication information channel;
respectively outputting the edge distribution probability of the tracking intelligent agent and the key point detection intelligent agent according to a Bayesian model, and respectively acquiring conditional probability distribution according to communication information between the tracking intelligent agent and the key point detection intelligent agent;
establishing a Markov decision model according to the edge distribution probability and the conditional probability distribution, wherein the tracking agent and the key point detection agent simultaneously update the positions of a detection frame and a key point through variable-length sequence actions and interactively transmit information to obtain a detection result; and
and optimizing the detection result by establishing a supervised learning training function and a reinforcement learning training function to obtain a final result.
2. The method for detecting video face key points based on dual agents as claimed in claim 1, wherein the tracking agent and the key point detecting agent are respectively established and connected through a communication information channel, further comprising:
the tracking intelligent agent is established based on a VGG-M model, a single-layer Q network is accessed, the key point detection intelligent agent is established through the combination of a cascade hourglass network and a confidence coefficient network, and the tracking intelligent agent is connected through two communication information channels based on a deconvolution layer and long-short term memory unit codes.
3. The method of claim 1, wherein a Markov decision model is established according to the edge distribution probability and the conditional probability distribution, and the tracking agent and the key point detecting agent simultaneously update positions of a detection frame and key points through variable-length sequence actions and interactively transmit information to obtain a detection result, further comprising:
the tracking agent changes the currently observed region by a movement action, wherein the movement action comprises a left, right, up, down, zoom in, and zoom out;
the key point detection agent decides whether the iteration stops by generating a stop or continue action.
4. The dual agent-based video face keypoint detection method of claim 1, wherein the normalized detection is used to obtain keypoint coordinates as a representation of three-dimensional pose information, and the long-short term memory unit (LSTM) is used to memorize the pose change in time dimension.
5. The dual agent-based video face keypoint detection method of claim 1, wherein the supervised learning training function and the reinforcement learning training function further comprise:
the supervised learning training function is:
the reinforcement learning training function is as follows:
6. the utility model provides a video face key point detection side's device based on two agents which characterized in that includes:
the establishing module is used for respectively establishing a tracking intelligent agent and a key point detection intelligent agent and is connected through a communication information channel;
the probability distribution acquisition module is used for respectively outputting the edge distribution probability of the tracking intelligent agent and the key point detection intelligent agent according to a Bayesian model and respectively acquiring conditional probability distribution according to communication information between the tracking intelligent agent and the key point detection intelligent agent;
the detection interaction module is used for establishing a Markov decision model according to the edge distribution probability and the conditional probability distribution, wherein the tracking intelligent agent and the key point detection intelligent agent perform actions through variable-length sequences, meanwhile, the positions of a detection frame and key points are updated, and information is interactively transmitted to obtain a detection result; and
and the optimization module is used for optimizing the detection result by establishing a supervised learning training function and an intensified learning training function so as to obtain a final result.
7. The dual agent-based video face keypoint detection apparatus of claim 6, wherein the establishment module further comprises:
the tracking intelligent agent is established based on a VGG-M model, a single-layer Q network is accessed, the key point detection intelligent agent is established through the combination of a cascade hourglass network and a confidence coefficient network, and the tracking intelligent agent is connected through two communication information channels based on a deconvolution layer and long-short term memory unit codes.
8. The dual-agent based video face keypoint detection apparatus according to claim 6, wherein said detection interaction module further comprises:
the tracking agent changes the currently observed region by a movement action, wherein the movement action comprises a left, right, up, down, zoom in, and zoom out;
the key point detection agent decides whether the iteration stops by generating a stop or continue action.
9. The dual agent-based video face keypoint detection apparatus of claim 6, wherein the normalized detection is used to obtain keypoint coordinates as a representation of three-dimensional pose information, and the long-short term memory unit (LSTM) is used to memorize the pose change in time dimension.
10. The dual agent-based video face keypoint detection device of claim 6, wherein the supervised learning training function and the reinforcement learning training function further comprise:
the supervised learning training function is:
the reinforcement learning training function is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811007365.1A CN109299669B (en) | 2018-08-30 | 2018-08-30 | Video face key point detection method and device based on double intelligent agents |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811007365.1A CN109299669B (en) | 2018-08-30 | 2018-08-30 | Video face key point detection method and device based on double intelligent agents |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109299669A true CN109299669A (en) | 2019-02-01 |
CN109299669B CN109299669B (en) | 2020-11-13 |
Family
ID=65166024
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811007365.1A Active CN109299669B (en) | 2018-08-30 | 2018-08-30 | Video face key point detection method and device based on double intelligent agents |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109299669B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188769A (en) * | 2019-05-14 | 2019-08-30 | 广州虎牙信息科技有限公司 | Checking method, device, equipment and the storage medium of key point mark |
CN110569724A (en) * | 2019-08-05 | 2019-12-13 | 湖北工业大学 | Face alignment method based on residual hourglass network |
CN111625098A (en) * | 2020-06-01 | 2020-09-04 | 广州市大湾区虚拟现实研究院 | Intelligent virtual avatar interaction method and device based on multi-channel information fusion |
CN112926475A (en) * | 2021-03-08 | 2021-06-08 | 电子科技大学 | Human body three-dimensional key point extraction method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101763636A (en) * | 2009-09-23 | 2010-06-30 | 中国科学院自动化研究所 | Method for tracing position and pose of 3D human face in video sequence |
US20130150160A1 (en) * | 2007-02-08 | 2013-06-13 | Edge 3 Technologies, Inc. | Method and Apparatus for Tracking of a Plurality of Subjects in a Video Game |
CN106407958A (en) * | 2016-10-28 | 2017-02-15 | 南京理工大学 | Double-layer-cascade-based facial feature detection method |
CN107423707A (en) * | 2017-07-25 | 2017-12-01 | 深圳帕罗人工智能科技有限公司 | A kind of face Emotion identification method based under complex environment |
CN107748858A (en) * | 2017-06-15 | 2018-03-02 | 华南理工大学 | A kind of multi-pose eye locating method based on concatenated convolutional neutral net |
CN107784284A (en) * | 2017-10-24 | 2018-03-09 | 哈尔滨工业大学深圳研究生院 | Face identification method and system |
-
2018
- 2018-08-30 CN CN201811007365.1A patent/CN109299669B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130150160A1 (en) * | 2007-02-08 | 2013-06-13 | Edge 3 Technologies, Inc. | Method and Apparatus for Tracking of a Plurality of Subjects in a Video Game |
CN101763636A (en) * | 2009-09-23 | 2010-06-30 | 中国科学院自动化研究所 | Method for tracing position and pose of 3D human face in video sequence |
CN106407958A (en) * | 2016-10-28 | 2017-02-15 | 南京理工大学 | Double-layer-cascade-based facial feature detection method |
CN107748858A (en) * | 2017-06-15 | 2018-03-02 | 华南理工大学 | A kind of multi-pose eye locating method based on concatenated convolutional neutral net |
CN107423707A (en) * | 2017-07-25 | 2017-12-01 | 深圳帕罗人工智能科技有限公司 | A kind of face Emotion identification method based under complex environment |
CN107784284A (en) * | 2017-10-24 | 2018-03-09 | 哈尔滨工业大学深圳研究生院 | Face identification method and system |
Non-Patent Citations (2)
Title |
---|
STEFAN DUFFNER AND JEAN-MARC ODOBEZ: "Track Creation and Deletion Framework for", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 * |
王晓晓: "基于拓扑结构的人脸图像特征提取及识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188769A (en) * | 2019-05-14 | 2019-08-30 | 广州虎牙信息科技有限公司 | Checking method, device, equipment and the storage medium of key point mark |
CN110188769B (en) * | 2019-05-14 | 2023-09-05 | 广州虎牙信息科技有限公司 | Method, device, equipment and storage medium for auditing key point labels |
CN110569724A (en) * | 2019-08-05 | 2019-12-13 | 湖北工业大学 | Face alignment method based on residual hourglass network |
CN110569724B (en) * | 2019-08-05 | 2021-06-04 | 湖北工业大学 | Face alignment method based on residual hourglass network |
CN111625098A (en) * | 2020-06-01 | 2020-09-04 | 广州市大湾区虚拟现实研究院 | Intelligent virtual avatar interaction method and device based on multi-channel information fusion |
CN112926475A (en) * | 2021-03-08 | 2021-06-08 | 电子科技大学 | Human body three-dimensional key point extraction method |
CN112926475B (en) * | 2021-03-08 | 2022-10-21 | 电子科技大学 | Human body three-dimensional key point extraction method |
Also Published As
Publication number | Publication date |
---|---|
CN109299669B (en) | 2020-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109299669B (en) | Video face key point detection method and device based on double intelligent agents | |
Dockstader et al. | Multiple camera fusion for multi-object tracking | |
CN111445476B (en) | Monocular depth estimation method based on multi-mode unsupervised image content decoupling | |
Pillai et al. | Towards visual ego-motion learning in robots | |
Chowdhury et al. | 3D face reconstruction from video using a generic model | |
CN103003846B (en) | Articulation region display device, joint area detecting device, joint area degree of membership calculation element, pass nodular region affiliation degree calculation element and joint area display packing | |
US20210233244A1 (en) | System and method for image segmentation using a joint deep learning model | |
WO2022206020A1 (en) | Method and apparatus for estimating depth of field of image, and terminal device and storage medium | |
CN110770758A (en) | Determining the position of a mobile device | |
CN111902826A (en) | Positioning, mapping and network training | |
CN113158861B (en) | Motion analysis method based on prototype comparison learning | |
CN114663496A (en) | Monocular vision odometer method based on Kalman pose estimation network | |
CN116381753B (en) | Neural network assisted navigation method of GNSS/INS integrated navigation system during GNSS interruption | |
Goldenstein et al. | Statistical cue integration in dag deformable models | |
Setiyadi et al. | Human Activity Detection Employing Full-Type 2D Blazepose Estimation with LSTM | |
EP1071021B1 (en) | Method for inferring target paths from related cue paths | |
CN112989952B (en) | Crowd density estimation method and device based on mask guidance | |
Zhang et al. | Human trajectory forecasting using a flow-based generative model | |
Porta et al. | Appearance-based concurrent map building and localization | |
CN110647917B (en) | Model multiplexing method and system | |
CN112348854A (en) | Visual inertial mileage detection method based on deep learning | |
CN117058235A (en) | Visual positioning method crossing various indoor scenes | |
Liu et al. | Joint estimation of pose, depth, and optical flow with a competition–cooperation transformer network | |
Xing et al. | Simultaneous localization and mapping algorithm based on the asynchronous fusion of laser and vision sensors | |
Xu et al. | Real-time robust and precise kernel learning for indoor localization under the internet of things |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |