CN109299669A - Video human face critical point detection method and device based on double intelligent bodies - Google Patents

Video human face critical point detection method and device based on double intelligent bodies Download PDF

Info

Publication number
CN109299669A
CN109299669A CN201811007365.1A CN201811007365A CN109299669A CN 109299669 A CN109299669 A CN 109299669A CN 201811007365 A CN201811007365 A CN 201811007365A CN 109299669 A CN109299669 A CN 109299669A
Authority
CN
China
Prior art keywords
detection
agent
key point
tracking
point detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811007365.1A
Other languages
Chinese (zh)
Other versions
CN109299669B (en
Inventor
鲁继文
周杰
郭明皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201811007365.1A priority Critical patent/CN109299669B/en
Publication of CN109299669A publication Critical patent/CN109299669A/en
Application granted granted Critical
Publication of CN109299669B publication Critical patent/CN109299669B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The video human face critical point detection method and device based on double intelligent bodies that the invention discloses a kind of, wherein method includes: to establish tracking intelligent body and critical point detection intelligent body respectively, and be connected by communication information channel;The edge distribution probability and conditional probability distribution of output tracking intelligent body and critical point detection intelligent body are distinguished according to Bayesian model;Markovian decision model is established according to edge distribution probability and conditional probability distribution, wherein, intelligent body and critical point detection intelligent body are tracked to act by the sequence of variable length, while updating the position of detection block and key point, and interaction transmitting information, to obtain testing result;And by establishing supervised learning training function and intensified learning training function optimization testing result, to obtain final result.This method exports the detection of face frame and the testing result of key point by interactive mode, has the advantages of performance for promoting detection system, optimizing detection result.

Description

Video face key point detection method and device based on double intelligent agents
Technical Field
The invention relates to the technical field of computer vision, in particular to a video face key point detection method and device based on double intelligent agents.
Background
In the prior art, video face key point detection is widely concerned in the field of computer vision based on the rapid development of face key detection in the field of images. In practical application, a video scene better meets the actual requirement, and not only can the video provide more frames of human faces than a single image, but also the video has time dimension information, so that the method is helpful for key point positioning and subsequent human face identification and living body detection. The purpose of video face key point detection is as follows: a section of face video is given, and a series of key points such as face parts, face outlines and the like are detected for all frames of the video. In practical application, because the acquired face video is in a non-limited environment, besides the problems of large posture, expression change, severe shielding and the like existing in a static image, the detection of key points of the face becomes more difficult due to illumination change, motion blur and the like in the video.
During the past decades, there have been many research methods for video face key point detection. Since it is very difficult to directly perform the key point detection from the whole image of the video frame without any prior, most methods make a frame-by-frame detection strategy, which adopts a serial mode to process the video face key point problem. Specifically, the methods firstly generate a face detection frame with high confidence for each frame of image, and then perform key point detection on the face image region framed by the detection frame. Although this strategy reduces the difficulty of keypoint detection by introducing a priori of detection frames, the accuracy of keypoint detection thus obtained depends largely on the generated detection frames. Fig. 1 shows the effect of the face detection box on the keypoint detection. It can be found that the accuracy of the detection of the key points is greatly affected by the slight deviation of the detection frame. The phenomenon is caused because the generation of the detection frame does not consider any pose and expression information of the face, and especially when the face is in an extreme condition, the face area covered by the detection frame often cannot contain all face key points, and finally the detection effect of the face key points is limited. Therefore, the detection of the key points of the face of the video needs to fully utilize the interactive information of the face detection frame and the key points to ensure the precision. Because the face key points can effectively represent the motion of the face across postures, the key points can provide additional useful information for the generation of an accurate detection frame. However, most of the existing video face key point detection methods ignore the mutual information between the two methods, and result in low precision under extreme conditions.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, one objective of the present invention is to provide a method for detecting video face key points based on dual agents, which has the advantages of improving the performance of a detection system and optimizing the detection result.
The invention also aims to provide a video human face key point detection device based on double agents.
In order to achieve the above object, an embodiment of the present invention provides a method for detecting video face key points based on dual agents, including the following steps: respectively establishing a tracking intelligent agent and a key point detection intelligent agent, and connecting the tracking intelligent agent and the key point detection intelligent agent through a communication information channel; respectively outputting the edge distribution probability of the tracking intelligent agent and the key point detection intelligent agent according to a Bayesian model, and respectively acquiring conditional probability distribution according to communication information between the tracking intelligent agent and the key point detection intelligent agent; establishing a Markov decision model according to the edge distribution probability and the conditional probability distribution, wherein the tracking agent and the key point detection agent simultaneously update the positions of a detection frame and a key point through variable-length sequence actions and interactively transmit information to obtain a detection result; and optimizing the detection result by establishing a supervised learning training function and a reinforcement learning training function to obtain a final result.
According to the video face key point detection method based on the double agents, the tracking agent and the key point detection agent are respectively established, the video face key point detection is analyzed in a probability mode according to the Bayesian model, the positions of the detection frame and the key point are simultaneously updated through the Markov decision model, and the detection result of the face frame and the detection result of the key point are output in an interactive mode.
In addition, the video face key point detection method based on the double agents according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the establishing a tracking agent and a key point detecting agent respectively, and connected by a communication channel, further includes: the tracking intelligent agent is established based on a VGG-M model, a single-layer Q network is accessed, the key point detection intelligent agent is established through the combination of a cascade hourglass network and a confidence coefficient network, and the tracking intelligent agent is connected through two communication information channels based on a deconvolution layer and long-short term memory unit codes.
Further, in an embodiment of the present invention, the establishing a markov decision model according to the edge distribution probability and the conditional probability distribution, the tracking agent and the key point detecting agent performing variable-length sequence actions, updating positions of a detection frame and a key point, and interactively transmitting information to obtain a detection result, further includes: the tracking agent changes the currently observed region by a movement action, wherein the movement action comprises a left, right, up, down, zoom in, and zoom out; the key point detection agent decides whether the iteration stops by generating a stop or continue action.
Further, in an embodiment of the present invention, the normalized detection is adopted to obtain the coordinates of the key points as a representative of the three-dimensional posture information, and the long-short term memory unit LSTM is used to memorize the posture change of the time dimension.
Further, in an embodiment of the present invention, the supervised learning training function and the reinforcement learning training function further include:
the supervised learning training function is:
wherein,
the reinforcement learning training function is as follows:
wherein,
in order to achieve the above object, an embodiment of the present invention provides a video face key point detecting device based on dual agents, including: the establishing module is used for respectively establishing a tracking intelligent agent and a key point detection intelligent agent and is connected through a communication information channel; the probability distribution acquisition module is used for respectively outputting the edge distribution probability of the tracking intelligent agent and the key point detection intelligent agent according to a Bayesian model and respectively acquiring conditional probability distribution according to communication information between the tracking intelligent agent and the key point detection intelligent agent; the detection interaction module is used for establishing a Markov decision model according to the edge distribution probability and the conditional probability distribution, wherein the tracking intelligent agent and the key point detection intelligent agent perform actions through variable-length sequences, meanwhile, the positions of a detection frame and key points are updated, and information is interactively transmitted to obtain a detection result; and the optimization module is used for optimizing the detection result by establishing a supervised learning training function and an intensified learning training function so as to obtain a final result.
The video face key point detection device based on the double agents respectively establishes the tracking agent and the key point detection agent, analyzes video face key point detection in a probability mode according to the Bayesian model, updates the positions of the detection frame and the key point simultaneously through the Markov decision model, and outputs the detection result of the face frame and the detection result of the key point in an interactive mode.
In addition, the video face key point detection device based on the dual agents according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the establishing module further includes: the tracking intelligent agent is established based on a VGG-M model, a single-layer Q network is accessed, the key point detection intelligent agent is established through the combination of a cascade hourglass network and a confidence coefficient network, and the tracking intelligent agent is connected through two communication information channels based on a deconvolution layer and long-short term memory unit codes.
Further, in an embodiment of the present invention, the detecting interaction module further includes: the tracking agent changes the currently observed region by a movement action, wherein the movement action comprises a left, right, up, down, zoom in, and zoom out; the key point detection agent decides whether the iteration stops by generating a stop or continue action.
Further, in an embodiment of the present invention, the normalized detection is adopted to obtain the coordinates of the key points as a representative of the three-dimensional posture information, and the long-short term memory unit LSTM is used to memorize the posture change of the time dimension.
Further, in an embodiment of the present invention, the supervised learning training function and the reinforcement learning training function further include:
the supervised learning training function is:
wherein,
the reinforcement learning training function is as follows:
wherein,
additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic diagram illustrating the effect of a face detection box on keypoint detection;
FIG. 2 is a flow chart of a method for detecting key points of a video face based on dual agents according to an embodiment of the invention;
fig. 3 is a schematic diagram illustrating the interactive output of the detection of the face frame and the detection of the key points in the video face key point detection method based on the dual agents according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of partial results on a subset of challenges in the public face database 300VW of a dual-agent based video face keypoint detection method according to one embodiment of the present invention; and
fig. 5 is a schematic structural diagram of a video face key point detection apparatus based on dual agents according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes a method and an apparatus for detecting video face key points based on dual agents according to an embodiment of the present invention with reference to the accompanying drawings, and first, a method for detecting video face key points based on dual agents according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Fig. 2 is a flowchart of a method for detecting key points of a video face based on dual agents according to an embodiment of the present invention.
As shown in fig. 2, the method for detecting video face key points based on dual agents includes the following steps:
in step S101, a tracking agent and a key point detection agent are respectively established and connected via a communication channel.
Specifically, a tracking intelligent agent is established based on a VGG-M model, a single-layer Q network is accessed, a key point detection intelligent agent is established through the combination of a cascade hourglass network and a confidence coefficient network, and the tracking intelligent agent is connected through two communication information channels based on a deconvolution layer and long-short term memory unit codes.
In one embodiment of the invention, the network structure comprises three parts: tracking intelligent structure, detecting intelligent structure and communication information channel by key point. The tracking intelligent agent structure is based on a VGG-M model, a single-layer Q network is accessed behind the tracking intelligent agent structure, and the key point detection intelligent agent structure is designed to be the combination of a cascade hourglass network and a confidence coefficient network. The two communication channels are encoded by the deconvolution layer and the Long Short Term Memory (LSTM) unit, respectively.
The communication information between two agents explicitly encodes the synergy between the two agents. The information passed from the tracking agent to the keypoint detection agent is intended to provide a priori additional texture information for the keypoint detection agent to improve the robustness of the keypoint detection. We chose as information the profile of the third convolutional layer conv3 that tracks the agent and paralleled it with the first order network of the hourglass network for keypoint detection in the depth dimension. Because the feature graph selected by the tracking agent and the feature graph corresponding to the first-stage network of the key point detection agent are different in size, the feature graph and the feature graph can be input into a subsequent network in parallel only by unifying the scales of the feature graph and the feature graph.
The embodiment adopts deconvolution operation to enlarge the scale of the feature map to meet the requirement, and meanwhile, the deconvolution layer contains learnable parameters, so that the transmitted information can be encoded more appropriately through training.
The information transferred from the key point detection agent to the tracking agent provides additional three-dimensional attitude information for detection frame tracking, and aims to provide prior knowledge of the face attitude for accurate frame tracking. To achieve this, the present embodiment uses the detected coordinates of the key points after normalization as a representative of three-dimensional pose information, and uses the long-short term memory unit LSTM to memorize the pose changes in the time dimension. For stable training, the middle layer of the LSTM is updated when a markov decision terminates.
In step S102, edge distribution probabilities of the tracking agent and the key point detecting agent are respectively output according to the bayesian model, and conditional probability distributions are respectively obtained according to communication information between the tracking agent and the key point detecting agent.
In one embodiment of the invention, the video face keypoint detection problem is analyzed in a probabilistic manner according to a Bayesian model. The final output of the video face key points can be regarded as a joint probability distribution, as follows:
p(Bk,Vk|Ik,Bk-1,Vk-1)=p(Bk|Ik,Bk-1,Vk-l)p(Vk|Bk,Ik,Bk-1,Vk-1),
wherein,
according to bayes' theorem, the joint probability distribution can be expressed in two ways, namely:
p(Bk|Ik)p(Vk|Bk,Ik)=p(Vk|Ik)p(Bk|Vk,Ik)
wherein,
in this embodiment, two agents, namely, the tracking agent and the key point detecting agent, are defined, and the marginal probability distribution in the above formula is output respectively, and the other two conditional probability distributions are represented by communication information between the two agents, so that the above equation constraint is ensured in an explicit manner through interaction between the two agents.
In step S103, a markov decision model is established according to the edge distribution probability and the conditional probability distribution, wherein the agent and the agent for detecting the key point are tracked to move through a variable length sequence, and the positions of the detection frame and the key point are updated at the same time, and information is interactively transmitted to obtain a detection result.
In particular, the tracking agent changes the currently observed region by a movement action, wherein the movement action comprises a left, right, up, down, zoom in and zoom out; the key point detection agent decides whether the iteration stops by generating a stop or continue action.
In one embodiment of the invention, the video face keypoint detection problem is modeled as a Markov decision process, and the following explains the key definitions in the Markov decision process:
the state is as follows: the face image blocks framed by the cutting detection frame are obtained by:
st=φ(B,I),
wherein,
the actions are as follows: for a tracking agent, the tracking agent generates movement actions to change the currently observed region, specifically, movement actions are defined as left, right, up, down, zoom in, zoom out.
For a keypoint detection agent, the keypoint detection generates a stop/continue action to decide whether the iteration should stop.
Rewarding: in the case of a tracking agent, it is,
wherein,
for a key point detection agent to be,
wherein,
in step S104, the detection result is optimized by establishing a supervised learning training function and an reinforcement learning training function to obtain a final result.
In one embodiment of the invention, a two-stage training method is adopted during training: supervision training and reinforcement learning training.
Wherein, the supervised learning training objective function is as follows:
wherein,
the reinforcement learning training objective function is:
wherein,
as shown in fig. 3, the effect is good as seen from the partial results on the challenged subset in the public face database 300 VW.
According to the video face key point detection method based on the double agents, the tracking agent and the key point detection agent are respectively established, the video face key point detection is analyzed in a probability mode according to the Bayesian model, the positions of the detection frame and the key point are simultaneously updated through the Markov decision model, and the detection result of the face frame and the detection result of the key point are output in an interactive mode.
Fig. 5 is a schematic structural diagram of a dual-agent-based video face keypoint detection apparatus according to an embodiment of the present invention.
As shown in fig. 5, the dual agent-based video face keypoint detection apparatus 10 includes: the system comprises a building module 100, a probability distribution obtaining module 200, a detection interaction module 300 and an optimization module 400.
The establishing module 100 is used for respectively establishing a tracking agent and a key point detection agent, and is connected through a communication information channel, and the probability distribution obtaining module 200 is used for respectively outputting the edge distribution probability of the tracking agent and the key point detection agent according to a Bayesian model, and respectively obtaining the conditional probability distribution according to the communication information between the tracking agent and the key point detection agent. The detection interaction module 300 is configured to establish a markov decision model according to the edge distribution probability and the conditional probability distribution, wherein the intelligent agent and the key point detection intelligent agent are tracked to perform variable-length sequence actions, and meanwhile, the positions of the detection frame and the key point are updated, and information is interactively transmitted to obtain a detection result. The optimization module 400 is configured to optimize the detection result by establishing a supervised learning training function and a reinforcement learning training function to obtain a final result. The video face key point detection side device based on the double intelligent agents has the advantages of improving the performance of a detection system and optimizing a detection result.
Further, in an embodiment of the present invention, the establishing module 100 further includes: a tracking intelligent body is established based on a VGG-M model, a single-layer Q network is accessed, a key point detection intelligent body is established by combining a cascade hourglass network and a confidence coefficient network, and the tracking intelligent body is connected through two communication information channels based on a deconvolution layer and long-short term memory unit codes.
Further, in an embodiment of the present invention, the module for detecting interaction 200 further comprises: tracking the agent to change the currently observed region by a movement action, wherein the movement action comprises moving left, right, up, down, zooming in and zooming out; the key point detection agent decides whether the iteration stops by generating a stop or continue action.
Further, in an embodiment of the present invention, the normalized detection is adopted to obtain the coordinates of the key points as a representative of the three-dimensional posture information, and the long-short term memory unit LSTM is used to memorize the posture change of the time dimension.
Further, in an embodiment of the present invention, the supervised learning training function and the reinforcement learning training function further comprise:
the supervised learning training function is:
wherein,
the reinforcement learning training function is:
wherein,
it should be noted that the explanation of the embodiment of the method for detecting key points of a video face based on dual agents is also applicable to the apparatus for detecting key points of a video face based on dual agents in this embodiment, and is not repeated here.
The video face key point detection device based on the double agents respectively establishes the tracking agent and the key point detection agent, analyzes video face key point detection in a probability mode according to the Bayesian model, updates the positions of the detection frame and the key point simultaneously through the Markov decision model, and outputs the detection result of the face frame and the detection result of the key point in an interactive mode.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A video face key point detection method based on double agents is characterized by comprising the following steps:
respectively establishing a tracking intelligent agent and a key point detection intelligent agent, and connecting the tracking intelligent agent and the key point detection intelligent agent through a communication information channel;
respectively outputting the edge distribution probability of the tracking intelligent agent and the key point detection intelligent agent according to a Bayesian model, and respectively acquiring conditional probability distribution according to communication information between the tracking intelligent agent and the key point detection intelligent agent;
establishing a Markov decision model according to the edge distribution probability and the conditional probability distribution, wherein the tracking agent and the key point detection agent simultaneously update the positions of a detection frame and a key point through variable-length sequence actions and interactively transmit information to obtain a detection result; and
and optimizing the detection result by establishing a supervised learning training function and a reinforcement learning training function to obtain a final result.
2. The method for detecting video face key points based on dual agents as claimed in claim 1, wherein the tracking agent and the key point detecting agent are respectively established and connected through a communication information channel, further comprising:
the tracking intelligent agent is established based on a VGG-M model, a single-layer Q network is accessed, the key point detection intelligent agent is established through the combination of a cascade hourglass network and a confidence coefficient network, and the tracking intelligent agent is connected through two communication information channels based on a deconvolution layer and long-short term memory unit codes.
3. The method of claim 1, wherein a Markov decision model is established according to the edge distribution probability and the conditional probability distribution, and the tracking agent and the key point detecting agent simultaneously update positions of a detection frame and key points through variable-length sequence actions and interactively transmit information to obtain a detection result, further comprising:
the tracking agent changes the currently observed region by a movement action, wherein the movement action comprises a left, right, up, down, zoom in, and zoom out;
the key point detection agent decides whether the iteration stops by generating a stop or continue action.
4. The dual agent-based video face keypoint detection method of claim 1, wherein the normalized detection is used to obtain keypoint coordinates as a representation of three-dimensional pose information, and the long-short term memory unit (LSTM) is used to memorize the pose change in time dimension.
5. The dual agent-based video face keypoint detection method of claim 1, wherein the supervised learning training function and the reinforcement learning training function further comprise:
the supervised learning training function is:
the reinforcement learning training function is as follows:
6. the utility model provides a video face key point detection side's device based on two agents which characterized in that includes:
the establishing module is used for respectively establishing a tracking intelligent agent and a key point detection intelligent agent and is connected through a communication information channel;
the probability distribution acquisition module is used for respectively outputting the edge distribution probability of the tracking intelligent agent and the key point detection intelligent agent according to a Bayesian model and respectively acquiring conditional probability distribution according to communication information between the tracking intelligent agent and the key point detection intelligent agent;
the detection interaction module is used for establishing a Markov decision model according to the edge distribution probability and the conditional probability distribution, wherein the tracking intelligent agent and the key point detection intelligent agent perform actions through variable-length sequences, meanwhile, the positions of a detection frame and key points are updated, and information is interactively transmitted to obtain a detection result; and
and the optimization module is used for optimizing the detection result by establishing a supervised learning training function and an intensified learning training function so as to obtain a final result.
7. The dual agent-based video face keypoint detection apparatus of claim 6, wherein the establishment module further comprises:
the tracking intelligent agent is established based on a VGG-M model, a single-layer Q network is accessed, the key point detection intelligent agent is established through the combination of a cascade hourglass network and a confidence coefficient network, and the tracking intelligent agent is connected through two communication information channels based on a deconvolution layer and long-short term memory unit codes.
8. The dual-agent based video face keypoint detection apparatus according to claim 6, wherein said detection interaction module further comprises:
the tracking agent changes the currently observed region by a movement action, wherein the movement action comprises a left, right, up, down, zoom in, and zoom out;
the key point detection agent decides whether the iteration stops by generating a stop or continue action.
9. The dual agent-based video face keypoint detection apparatus of claim 6, wherein the normalized detection is used to obtain keypoint coordinates as a representation of three-dimensional pose information, and the long-short term memory unit (LSTM) is used to memorize the pose change in time dimension.
10. The dual agent-based video face keypoint detection device of claim 6, wherein the supervised learning training function and the reinforcement learning training function further comprise:
the supervised learning training function is:
the reinforcement learning training function is as follows:
CN201811007365.1A 2018-08-30 2018-08-30 Video face key point detection method and device based on double intelligent agents Active CN109299669B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811007365.1A CN109299669B (en) 2018-08-30 2018-08-30 Video face key point detection method and device based on double intelligent agents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811007365.1A CN109299669B (en) 2018-08-30 2018-08-30 Video face key point detection method and device based on double intelligent agents

Publications (2)

Publication Number Publication Date
CN109299669A true CN109299669A (en) 2019-02-01
CN109299669B CN109299669B (en) 2020-11-13

Family

ID=65166024

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811007365.1A Active CN109299669B (en) 2018-08-30 2018-08-30 Video face key point detection method and device based on double intelligent agents

Country Status (1)

Country Link
CN (1) CN109299669B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188769A (en) * 2019-05-14 2019-08-30 广州虎牙信息科技有限公司 Checking method, device, equipment and the storage medium of key point mark
CN110569724A (en) * 2019-08-05 2019-12-13 湖北工业大学 Face alignment method based on residual hourglass network
CN111625098A (en) * 2020-06-01 2020-09-04 广州市大湾区虚拟现实研究院 Intelligent virtual avatar interaction method and device based on multi-channel information fusion
CN112926475A (en) * 2021-03-08 2021-06-08 电子科技大学 Human body three-dimensional key point extraction method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763636A (en) * 2009-09-23 2010-06-30 中国科学院自动化研究所 Method for tracing position and pose of 3D human face in video sequence
US20130150160A1 (en) * 2007-02-08 2013-06-13 Edge 3 Technologies, Inc. Method and Apparatus for Tracking of a Plurality of Subjects in a Video Game
CN106407958A (en) * 2016-10-28 2017-02-15 南京理工大学 Double-layer-cascade-based facial feature detection method
CN107423707A (en) * 2017-07-25 2017-12-01 深圳帕罗人工智能科技有限公司 A kind of face Emotion identification method based under complex environment
CN107748858A (en) * 2017-06-15 2018-03-02 华南理工大学 A kind of multi-pose eye locating method based on concatenated convolutional neutral net
CN107784284A (en) * 2017-10-24 2018-03-09 哈尔滨工业大学深圳研究生院 Face identification method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130150160A1 (en) * 2007-02-08 2013-06-13 Edge 3 Technologies, Inc. Method and Apparatus for Tracking of a Plurality of Subjects in a Video Game
CN101763636A (en) * 2009-09-23 2010-06-30 中国科学院自动化研究所 Method for tracing position and pose of 3D human face in video sequence
CN106407958A (en) * 2016-10-28 2017-02-15 南京理工大学 Double-layer-cascade-based facial feature detection method
CN107748858A (en) * 2017-06-15 2018-03-02 华南理工大学 A kind of multi-pose eye locating method based on concatenated convolutional neutral net
CN107423707A (en) * 2017-07-25 2017-12-01 深圳帕罗人工智能科技有限公司 A kind of face Emotion identification method based under complex environment
CN107784284A (en) * 2017-10-24 2018-03-09 哈尔滨工业大学深圳研究生院 Face identification method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
STEFAN DUFFNER AND JEAN-MARC ODOBEZ: "Track Creation and Deletion Framework for", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 *
王晓晓: "基于拓扑结构的人脸图像特征提取及识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188769A (en) * 2019-05-14 2019-08-30 广州虎牙信息科技有限公司 Checking method, device, equipment and the storage medium of key point mark
CN110188769B (en) * 2019-05-14 2023-09-05 广州虎牙信息科技有限公司 Method, device, equipment and storage medium for auditing key point labels
CN110569724A (en) * 2019-08-05 2019-12-13 湖北工业大学 Face alignment method based on residual hourglass network
CN110569724B (en) * 2019-08-05 2021-06-04 湖北工业大学 Face alignment method based on residual hourglass network
CN111625098A (en) * 2020-06-01 2020-09-04 广州市大湾区虚拟现实研究院 Intelligent virtual avatar interaction method and device based on multi-channel information fusion
CN112926475A (en) * 2021-03-08 2021-06-08 电子科技大学 Human body three-dimensional key point extraction method
CN112926475B (en) * 2021-03-08 2022-10-21 电子科技大学 Human body three-dimensional key point extraction method

Also Published As

Publication number Publication date
CN109299669B (en) 2020-11-13

Similar Documents

Publication Publication Date Title
CN109299669B (en) Video face key point detection method and device based on double intelligent agents
Dockstader et al. Multiple camera fusion for multi-object tracking
CN111445476B (en) Monocular depth estimation method based on multi-mode unsupervised image content decoupling
Pillai et al. Towards visual ego-motion learning in robots
Chowdhury et al. 3D face reconstruction from video using a generic model
CN103003846B (en) Articulation region display device, joint area detecting device, joint area degree of membership calculation element, pass nodular region affiliation degree calculation element and joint area display packing
US20210233244A1 (en) System and method for image segmentation using a joint deep learning model
WO2022206020A1 (en) Method and apparatus for estimating depth of field of image, and terminal device and storage medium
CN110770758A (en) Determining the position of a mobile device
CN111902826A (en) Positioning, mapping and network training
CN113158861B (en) Motion analysis method based on prototype comparison learning
CN114663496A (en) Monocular vision odometer method based on Kalman pose estimation network
CN116381753B (en) Neural network assisted navigation method of GNSS/INS integrated navigation system during GNSS interruption
Goldenstein et al. Statistical cue integration in dag deformable models
Setiyadi et al. Human Activity Detection Employing Full-Type 2D Blazepose Estimation with LSTM
EP1071021B1 (en) Method for inferring target paths from related cue paths
CN112989952B (en) Crowd density estimation method and device based on mask guidance
Zhang et al. Human trajectory forecasting using a flow-based generative model
Porta et al. Appearance-based concurrent map building and localization
CN110647917B (en) Model multiplexing method and system
CN112348854A (en) Visual inertial mileage detection method based on deep learning
CN117058235A (en) Visual positioning method crossing various indoor scenes
Liu et al. Joint estimation of pose, depth, and optical flow with a competition–cooperation transformer network
Xing et al. Simultaneous localization and mapping algorithm based on the asynchronous fusion of laser and vision sensors
Xu et al. Real-time robust and precise kernel learning for indoor localization under the internet of things

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant