CN110210306B - Face tracking method and camera - Google Patents

Face tracking method and camera Download PDF

Info

Publication number
CN110210306B
CN110210306B CN201910361317.0A CN201910361317A CN110210306B CN 110210306 B CN110210306 B CN 110210306B CN 201910361317 A CN201910361317 A CN 201910361317A CN 110210306 B CN110210306 B CN 110210306B
Authority
CN
China
Prior art keywords
facial feature
face
frame image
feature point
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910361317.0A
Other languages
Chinese (zh)
Other versions
CN110210306A (en
Inventor
刘子伟
吴涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Xiaoniao Kankan Technology Co Ltd
Original Assignee
Qingdao Xiaoniao Kankan Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Xiaoniao Kankan Technology Co Ltd filed Critical Qingdao Xiaoniao Kankan Technology Co Ltd
Priority to CN201910361317.0A priority Critical patent/CN110210306B/en
Publication of CN110210306A publication Critical patent/CN110210306A/en
Application granted granted Critical
Publication of CN110210306B publication Critical patent/CN110210306B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a face tracking method and a camera. The method of the invention comprises the following steps: acquiring an initial position of a facial feature point of a first frame image in an acquired image frame sequence; taking the initial positions of the facial feature points as the input of a convolutional neural network, outputting a position probability heat map of the facial feature points by the convolutional neural network, and performing iterative regression processing on the position probability heat map of the facial feature points by adopting a point distribution model to obtain the pixel positions of the facial feature points in the first frame image; and acquiring the initial position of the facial feature point of the second frame image in the image frame sequence by utilizing the pixel position of the facial feature point in the first frame image, thereby realizing face tracking. The method and the device can realize face alignment tracking between frames, avoid the problems of instability and jitter of face feature points between frames, and improve the tracking accuracy and the tracking speed.

Description

Face tracking method and camera
Technical Field
The invention relates to the technical field of machine learning, in particular to a face tracking method and a camera.
Background
The face alignment is to detect a face in an image and label each specific point. Face alignment techniques are commonly used in video processing, such as live, short video, and other applications.
The real-time tracking problem of the human face alignment greatly helps the processing of videos, however, the current research on the human face alignment is enthusiastic, the real-time tracking research on the human face alignment is less, and most of point positions determined by a tracking algorithm have jitter among frames, so that the tracking result has obvious distortion.
Disclosure of Invention
The present invention provides a face tracking method and camera to at least partially solve the above problems.
In a first aspect, the present invention provides a face tracking method, including: acquiring an initial position of a facial feature point of a first frame image in an acquired image frame sequence; taking the initial positions of the facial feature points as the input of a convolutional neural network, outputting a position probability heat map of the facial feature points by the convolutional neural network, and performing iterative regression processing on the position probability heat map of the facial feature points by adopting a point distribution model to obtain the pixel positions of the facial feature points in the first frame image; wherein the location probability heatmap represents a probability that the facial feature point is at a pixel location in the first frame image; and acquiring the initial position of the facial feature point of the second frame image in the image frame sequence by utilizing the pixel position of the facial feature point in the first frame image, thereby realizing face tracking.
In some embodiments, obtaining an initial position of a facial feature point of a second frame image in the sequence of image frames using a pixel position of the facial feature point in the first frame image comprises: performing face detection on the second frame image by using a multitask cascade convolution network or by using a machine learning tool Dlib; when a face is detected in the second frame image, the initial position of the facial feature point in the second frame image is obtained according to the position information of the facial feature point in the first frame image and a preset relaxation amount, wherein the preset relaxation amount represents the position change of the same feature point in the adjacent frame image.
In some embodiments, obtaining the initial position of the facial feature point in the second frame image according to the position information of the facial feature point in the first frame image and a preset slack amount comprises: according to
Figure BDA0002046895900000021
Acquiring initial positions of facial feature points in the second frame image; wherein the content of the first and second substances,
Figure BDA0002046895900000022
as the initial position of the facial feature point i in the second frame image,
Figure BDA0002046895900000023
the position information of the face characteristic point i in the first frame image is represented by i which is a natural number greater than 1, alpha is greater than 0 and less than 1, alpha is a preset adjustment factor, dxmonIs the preset amount of relaxation.
In some embodiments, the performing face detection on the second frame image by using a multi-task cascaded convolutional network or by using a machine learning tool Dlib further includes: when the face in the second frame image is not detected, extracting face characteristic points from a pre-constructed average face image; determining a point location of a facial feature point on the average face as an initial location of the facial feature point in the second frame image.
In some embodiments, obtaining an initial position of a facial feature point of a first frame image in the sequence of image frames comprises: extracting face characteristic points from a pre-constructed average face image; determining a point location of a facial feature point on the average face as an initial location of the facial feature point in the first frame image.
In some embodiments, extracting facial feature points from the pre-constructed average face image comprises: acquiring facial feature points of each face training sample in a face training sample set, wherein the facial feature points of each face training sample in the face training sample set are calibrated; constructing a mapping matrix from the face training samples to an average face model according to the face feature points of each face training sample; and respectively superposing the facial feature points of all the face training samples to the average face model according to the mapping matrix to obtain the average face image, and determining the superposed facial feature points as the facial feature points of the average face image.
In a second aspect, the present invention provides a camera comprising: a camera and a processor; the camera collects an image frame sequence of the face of the user and sends the image frame sequence to the processor; the processor is used for acquiring the initial position of the facial feature point of the first frame image in the image frame sequence; taking the initial positions of the facial feature points as the input of a convolutional neural network, outputting a position probability heat map of the facial feature points by the convolutional neural network, and performing iterative regression processing on the position probability heat map of the facial feature points by adopting a point distribution model to obtain position information of the facial feature points in the first frame image; wherein the location probability heat map represents a probability that the facial feature point is at each pixel location in the first frame image; and acquiring the initial position of the facial feature point of the second frame image in the image frame sequence by utilizing the position information of the facial feature point in the first frame image, thereby realizing the face tracking.
In some embodiments, the processor further performs face detection on the second frame image using a multitask cascaded convolutional network or using a machine learning tool Dlib; when a face is detected in the second frame image, the initial position of the facial feature point in the second frame image is obtained according to the position information of the facial feature point in the first frame image and a preset relaxation amount, wherein the preset relaxation amount represents the position change of the same feature point in the adjacent frame image.
In some embodiments, the processor extracts facial feature points from a pre-constructed average face image when a face in the second frame image is not detected; determining a point location of a facial feature point on the average face as an initial location of the facial feature point in the second frame image.
In some embodiments, the processor obtains facial feature points of each face training sample in a face training sample set, wherein the facial feature points of each face training sample in the face training sample set are calibrated; constructing a mapping matrix from the face training samples to an average face model according to the face feature points of each face training sample; and respectively superposing the facial feature points of all the face training samples to the average face model according to the mapping matrix to obtain the average face image, and determining the superposed facial feature points as the facial feature points of the average face image.
The method and the device predict the initial position of the facial feature point of the second frame image by using the facial feature point of the first frame image, and identify the specific pixel position of the facial feature point in each frame image by adopting a mode of combining a convolutional neural network and PDM when the initial position of the facial feature point in each frame image is obtained, thereby realizing face alignment tracking between frames, avoiding the problems of instability and jitter of the facial feature point between frames, and improving the tracking accuracy and tracking speed.
Drawings
FIG. 1 is a flow chart of a face tracking method according to an embodiment of the present invention;
fig. 2 is a block diagram of a camera according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings. It is to be understood that such description is merely illustrative and not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The words "a", "an" and "the" and the like as used herein are also intended to include the meanings of "a plurality" and "the" unless the context clearly dictates otherwise. Furthermore, the terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.
Thus, the techniques of the present invention may be implemented in hardware and/or in software (including firmware, microcode, etc.). Furthermore, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system. In the context of the present invention, a computer-readable storage medium may be any medium that can contain, store, communicate, propagate, or transport the instructions. For example, a computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the computer-readable storage medium include: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or wired/wireless communication links.
The face tracking means that the position of the facial feature point of the frame is predicted according to the information of the previous frame, and the face tracking method has very important significance for the face alignment of the frame, can reduce the search range of the frame, and can improve the tracking accuracy. The embodiment provides a jitter-free interframe tracking method aiming at the unreal problem of face alignment caused by jitter of interframe tracking.
Fig. 1 is a flowchart of a face tracking method according to an embodiment of the present invention, and as shown in fig. 1, the method according to the embodiment includes:
s110, acquiring the initial position of the facial feature point of the first frame image in the acquired image frame sequence.
The facial feature points include feature points for identifying eyes, nose, mouth, eyebrows, face contours and the like.
S120, taking the initial positions of the facial feature points as the input of a convolutional neural network, outputting a position probability heat map of the facial feature points by the convolutional neural network, and performing iterative regression processing on the position probability heat map of the facial feature points by adopting a point distribution model to obtain the pixel positions of the facial feature points in the first frame image; wherein the location probability heat map represents a probability that the facial feature point is at each pixel location in the first frame image.
In the embodiment, the positions of the facial feature points are processed by using a convolutional neural network, then iterative regression processing is performed by using a Point Distribution Model (PDM) based on the position probability of the facial feature points output by the convolutional neural network, so as to obtain the specific pixel position information of the facial feature points, and the recognition accuracy of the facial feature points is improved by combining the convolutional neural network and the PDM.
S130, acquiring the initial position of the facial feature point of the second frame image in the image frame sequence by using the pixel position of the facial feature point in the first frame image, and realizing face tracking.
The initial positions of the facial feature points in the second frame image can be predicted based on the pixel positions of the facial feature points in the first frame image, then the pixel positions of the facial feature points in the second frame image can be obtained by combining a convolutional neural network and a PDM, the pixel positions of the facial feature points in each frame of subsequent images in the image frame sequence can be predicted in sequence, and tracking and identification of the human face are achieved.
In the embodiment, the initial position of the facial feature point of the second frame image is predicted by using the facial feature point of the first frame image, and when the initial position of the facial feature point in each frame image is obtained, the specific pixel position of the facial feature point in each frame image is identified by adopting a mode of combining a convolutional neural network and a PDM (product data model), so that the inter-frame face alignment tracking is realized, the problems of instability and jitter of the inter-frame facial feature point are avoided, and the tracking accuracy and tracking speed are improved.
The above steps S110 to S130 will be described in detail.
First, step S110 is performed, i.e., an initial position of a facial feature point of a first frame image in the captured image frame sequence is acquired.
In some embodiments, the initial position of the facial feature point in the first frame image is obtained as follows: extracting face characteristic points from a pre-constructed average face image; determining a point location of a facial feature point on the average face as an initial location of the facial feature point in the first frame image.
The method comprises the steps of obtaining facial feature points of each face training sample in a face training sample set, wherein the facial feature points of each face training sample in the face training sample set are calibrated; constructing a mapping matrix from the face training samples to an average face model according to the face feature points of each face training sample; and respectively superposing the facial feature points of all the face training samples to the average face model according to the mapping matrix to obtain the average face image, and determining the superposed facial feature points as the facial feature points of the average face image.
After acquiring the initial positions of the facial feature points of the first frame image, continuing to execute step S120, that is, taking the initial positions of the facial feature points as the input of a convolutional neural network, wherein the convolutional neural network outputs a position probability heat map of the facial feature points, and performing iterative regression processing on the position probability heat map of the facial feature points by using a point distribution model to obtain position information of the facial feature points in the first frame image; wherein the location probability heat map represents a probability that the facial feature point is at each pixel location in the first frame image.
The embodiment firstly uses a convolutional neural network to calculate a position probability heat map of facial feature points relative to positions, the convolutional neural network can improve the calculation speed, the PDM is a model which can obtain specific positions of the points through iterative regression processing through the position probability heat map of the points, the PDM is used for performing iterative regression processing on the position probability heat map of the facial feature points relative to the positions after the position probability heat map of the facial feature points relative to the positions is calculated by the convolutional neural network, and specific pixel positions of the facial feature points are determined.
After the position information of the facial feature points in the first frame image is obtained, step S130 is continuously executed, that is, the initial positions of the facial feature points in the second frame image in the image frame sequence are obtained by using the position information of the facial feature points in the first frame image, so as to implement face tracking.
In some embodiments, the initial positions of the facial feature points in the second frame image are obtained by: performing face detection on the second frame image by using a multitask cascade convolution network or by using a machine learning tool Dlib; when a face is detected in the second frame image, the initial position of the facial feature point in the second frame image is obtained according to the position information of the facial feature point in the first frame image and a preset relaxation amount, wherein the preset relaxation amount represents the position change of the same feature point in the adjacent frame image.
In connection with an example of this embodiment, may be according to
Figure BDA0002046895900000061
Acquiring initial positions of facial feature points in the second frame image; wherein the content of the first and second substances,
Figure BDA0002046895900000062
as the initial position of the facial feature point i in the second frame image,
Figure BDA0002046895900000063
the position information of the face characteristic point i in the first frame image is represented by i which is a natural number greater than 1, alpha is greater than 0 and less than 1, alpha is a preset adjustment factor, dxmonIs the preset amount of relaxation.
When the second frame image is subjected to face detection by utilizing a multitask cascade convolution network or a machine learning tool Dlib, if a face is not detected in the second frame image, extracting face characteristic points from a pre-constructed average face image; determining a point location of a facial feature point on the average face as an initial location of the facial feature point in the second frame image.
The method comprises the steps of obtaining facial feature points of each face training sample in a face training sample set, wherein the facial feature points of each face training sample in the face training sample set are calibrated; constructing a mapping matrix from the face training samples to an average face model according to the face feature points of each face training sample; respectively superposing the facial feature points of all the face training samples to the average face model according to the mapping matrix to obtain the average face image, and determining the superposed facial feature points as the facial feature points of the average face image; determining a point location of a facial feature point on the average face as an initial location of the facial feature point in the second frame image.
The invention also provides a camera.
Fig. 2 is a block diagram of a camera according to an embodiment of the present invention, and as shown in fig. 2, the camera according to the embodiment includes: a camera and a processor; wherein the content of the first and second substances,
the camera is used for collecting an image frame sequence of the face of the user and sending the image frame sequence to the processor;
the processor is used for acquiring the initial position of the facial feature point of the first frame image in the image frame sequence; taking the initial positions of the facial feature points as the input of a convolutional neural network, outputting a position probability heat map of the facial feature points by the convolutional neural network, and performing iterative regression processing on the position probability heat map of the facial feature points by adopting a point distribution model to obtain position information of the facial feature points in the first frame image; wherein the location probability heat map represents a probability that the facial feature point is at each pixel location in the first frame image; and acquiring the initial position of the facial feature point of the second frame image in the image frame sequence by utilizing the position information of the facial feature point in the first frame image, thereby realizing the face tracking.
In the embodiment, the initial position of the facial feature point of the second frame image is predicted by using the facial feature point of the first frame image, and when the initial position of the facial feature point in each frame image is obtained, the specific pixel position of the facial feature point in each frame image is identified by adopting a mode of combining a convolutional neural network and a PDM (product data model), so that the inter-frame face alignment tracking is realized, the problems of instability and jitter of the inter-frame facial feature point are avoided, and the tracking accuracy and tracking speed are improved.
In some embodiments, the processor further performs face detection on the second frame image by using a multitask cascade convolution network or by using a machine learning tool Dlib; when a face is detected in the second frame image, the initial position of the facial feature point in the second frame image is obtained according to the position information of the facial feature point in the first frame image and a preset relaxation amount, wherein the preset relaxation amount represents the position change of the same feature point in the adjacent frame image.
In connection with one example of the embodiment, a processor
Figure BDA0002046895900000081
Acquiring initial positions of facial feature points in the second frame image; wherein the content of the first and second substances,
Figure BDA0002046895900000082
as the initial position of the facial feature point i in the second frame image,
Figure BDA0002046895900000083
the position information of the face characteristic point i in the first frame image is represented by i which is a natural number greater than 1, alpha is greater than 0 and less than 1, alpha is a preset adjustment factor, dxmonIs the preset amount of relaxation.
In some embodiments, the processor, when the human face in the second frame image is not detected, extracts facial feature points from a pre-constructed average face image, and determines point positions of the facial feature points on the average face as initial positions of the facial feature points in the second frame image.
In some embodiments, the processor extracts facial feature points from a pre-constructed average face image, and determines point locations of the facial feature points on the average face as initial positions of the facial feature points in the first frame image.
The processor is used for specifically acquiring facial feature points of each face training sample in a face training sample set, wherein the facial feature points of each face training sample in the face training sample set are calibrated; constructing a mapping matrix from the face training samples to an average face model according to the face feature points of each face training sample; and respectively superposing the facial feature points of all the face training samples to the average face model according to the mapping matrix to obtain the average face image, and determining the superposed facial feature points as the facial feature points of the average face image.
For the camera embodiment, since it basically corresponds to the method embodiment, the relevant points may be referred to the partial description of the method embodiment. The above-described camera embodiments are merely illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
For the convenience of clearly describing the technical solutions of the embodiments of the present invention, in the embodiments of the present invention, the words "first", "second", and the like are used to distinguish the same items or similar items with basically the same functions and actions, and those skilled in the art can understand that the words "first", "second", and the like do not limit the quantity and execution order.
While the foregoing is directed to embodiments of the present invention, other modifications and variations of the present invention may be devised by those skilled in the art in light of the above teachings. It should be understood by those skilled in the art that the foregoing detailed description is for the purpose of better explaining the present invention, and the scope of the present invention should be determined by the scope of the appended claims.

Claims (10)

1. A face tracking method, comprising:
acquiring an initial position of a facial feature point of a first frame image in an acquired image frame sequence;
taking the initial positions of the facial feature points as the input of a convolutional neural network, outputting a position probability heat map of the facial feature points by the convolutional neural network, and performing iterative regression processing on the position probability heat map of the facial feature points by adopting a point distribution model to obtain the pixel positions of the facial feature points in the first frame image; wherein the location probability heatmap represents a probability that the facial feature point is at a pixel location in the first frame image;
acquiring the initial position of the facial feature point of the second frame image in the image frame sequence by using the pixel position of the facial feature point in the first frame image to realize face tracking, specifically: and obtaining the initial position of the facial feature point in the second frame image according to the position information of the facial feature point in the first frame image and a preset relaxation amount, wherein the preset relaxation amount represents the position change of the same feature point in the adjacent frame images.
2. The method of claim 1, further comprising, before obtaining an initial position of a facial feature point of a second frame image in the sequence of image frames using a pixel position of the facial feature point in a first frame image:
performing face detection on the second frame image by using a multitask cascade convolution network or by using a machine learning tool Dlib;
and when a human face is detected in the second frame image, acquiring the initial position of the facial feature point in the second frame image according to the position information of the facial feature point in the first frame image and a preset relaxation amount.
3. The method according to claim 2, wherein the obtaining of the initial position of the facial feature point in the second frame image according to the position information of the facial feature point in the first frame image and a preset slack amount comprises:
according to
Figure FDA0003129478300000011
Acquiring initial positions of facial feature points in the second frame image;
wherein the content of the first and second substances,
Figure FDA0003129478300000012
as the initial position of the facial feature point i in the second frame image,
Figure FDA0003129478300000013
the position information of the face characteristic point i in the first frame image is represented by i which is a natural number greater than 1, alpha is greater than 0 and less than 1, alpha is a preset adjustment factor, dxmonIs the preset amount of relaxation.
4. The method according to claim 2, wherein the performing face detection on the second frame image by using a multitask cascade convolution network or by using a machine learning tool Dlib further comprises:
when the face in the second frame image is not detected, extracting face characteristic points from a pre-constructed average face image;
determining a point location of a facial feature point on the average face as an initial location of the facial feature point in the second frame image.
5. The method of claim 1, wherein obtaining the initial position of the facial feature point of the first image in the sequence of image frames comprises:
extracting face characteristic points from a pre-constructed average face image;
determining a point location of a facial feature point on the average face as an initial location of the facial feature point in the first frame image.
6. The method according to claim 4 or 5, wherein the extracting facial feature points from the pre-constructed average face image comprises:
acquiring facial feature points of each face training sample in a face training sample set, wherein the facial feature points of each face training sample in the face training sample set are calibrated;
constructing a mapping matrix from the face training samples to an average face model according to the face feature points of each face training sample;
and respectively superposing the facial feature points of all the face training samples to the average face model according to the mapping matrix to obtain the average face image, and determining the superposed facial feature points as the facial feature points of the average face image.
7. A camera, comprising: a camera and a processor;
the camera collects an image frame sequence of the face of the user and sends the image frame sequence to the processor;
the processor is used for acquiring the initial position of the facial feature point of the first frame image in the image frame sequence; taking the initial positions of the facial feature points as the input of a convolutional neural network, outputting a position probability heat map of the facial feature points by the convolutional neural network, and performing iterative regression processing on the position probability heat map of the facial feature points by adopting a point distribution model to obtain position information of the facial feature points in the first frame image; wherein the location probability heat map represents a probability that the facial feature point is at each pixel location in the first frame image; acquiring the initial position of the facial feature point of the second frame image in the image frame sequence by using the position information of the facial feature point in the first frame image, specifically: and obtaining the initial position of the facial feature point in the second frame image according to the position information of the facial feature point in the first frame image and a preset relaxation amount to realize face tracking, wherein the preset relaxation amount represents the position change of the same feature point in the adjacent frame images.
8. The camera according to claim 7, wherein the processor further performs face detection on the second frame image by using a multitask cascade convolution network or by using a machine learning tool Dlib; and when a human face is detected in the second frame image, acquiring the initial position of the facial feature point in the second frame image according to the position information of the facial feature point in the first frame image and a preset relaxation amount.
9. The camera according to claim 8, wherein the processor extracts facial feature points from a pre-constructed average face image when a human face is not detected in the second frame image; determining a point location of a facial feature point on the average face as an initial location of the facial feature point in the second frame image.
10. The camera according to claim 7, wherein the processor obtains facial feature points of each face training sample in a face training sample set, and the facial feature points of each face training sample in the face training sample set are calibrated; constructing a mapping matrix from the face training samples to an average face model according to the face feature points of each face training sample; and respectively superposing the facial feature points of all the face training samples to the average face model according to the mapping matrix to obtain the average face image, and determining the superposed facial feature points as the facial feature points of the average face image.
CN201910361317.0A 2019-04-30 2019-04-30 Face tracking method and camera Active CN110210306B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910361317.0A CN110210306B (en) 2019-04-30 2019-04-30 Face tracking method and camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910361317.0A CN110210306B (en) 2019-04-30 2019-04-30 Face tracking method and camera

Publications (2)

Publication Number Publication Date
CN110210306A CN110210306A (en) 2019-09-06
CN110210306B true CN110210306B (en) 2021-09-14

Family

ID=67786832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910361317.0A Active CN110210306B (en) 2019-04-30 2019-04-30 Face tracking method and camera

Country Status (1)

Country Link
CN (1) CN110210306B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2672425A1 (en) * 2012-06-08 2013-12-11 Realeyes OÜ Method and apparatus with deformable model fitting using high-precision approximation
CN103714331A (en) * 2014-01-10 2014-04-09 南通大学 Facial expression feature extraction method based on point distribution model
CN105512627A (en) * 2015-12-03 2016-04-20 腾讯科技(深圳)有限公司 Key point positioning method and terminal
CN109241910A (en) * 2018-09-07 2019-01-18 高新兴科技集团股份有限公司 A kind of face key independent positioning method returned based on the cascade of depth multiple features fusion

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10896318B2 (en) * 2017-09-09 2021-01-19 Apple Inc. Occlusion detection for facial recognition processes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2672425A1 (en) * 2012-06-08 2013-12-11 Realeyes OÜ Method and apparatus with deformable model fitting using high-precision approximation
CN103714331A (en) * 2014-01-10 2014-04-09 南通大学 Facial expression feature extraction method based on point distribution model
CN105512627A (en) * 2015-12-03 2016-04-20 腾讯科技(深圳)有限公司 Key point positioning method and terminal
CN109241910A (en) * 2018-09-07 2019-01-18 高新兴科技集团股份有限公司 A kind of face key independent positioning method returned based on the cascade of depth multiple features fusion

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《Constrained Local Neural Fields for robust facial landmark detection in the wild》;Tadas Baltrusaitis,et al;《ICCV2013》;20131231;第354-361页 *
《Deep Alignment Network:A convolutional neural network for robust face alignment》;Marek Kowalski,et al;《arXiv:1706.01789v2》;20170810;第1-10页 *
《引入全局约束的精简人脸关键点检测网络》;张伟,等;《信号处理》;20190331;第35卷(第3期);第507-515页 *

Also Published As

Publication number Publication date
CN110210306A (en) 2019-09-06

Similar Documents

Publication Publication Date Title
JP7236545B2 (en) Video target tracking method and apparatus, computer apparatus, program
CN109214343B (en) Method and device for generating face key point detection model
KR102150776B1 (en) Face location tracking method, apparatus and electronic device
CN107274433B (en) Target tracking method and device based on deep learning and storage medium
WO2018188453A1 (en) Method for determining human face area, storage medium, and computer device
US11238272B2 (en) Method and apparatus for detecting face image
JP6694829B2 (en) Rule-based video importance analysis
US20230030267A1 (en) Method and apparatus for selecting face image, device, and storage medium
WO2020024484A1 (en) Method and device for outputting data
CN109308469B (en) Method and apparatus for generating information
US10620826B2 (en) Object selection based on region of interest fusion
US8903130B1 (en) Virtual camera operator
US20150269739A1 (en) Apparatus and method for foreground object segmentation
CN111104925B (en) Image processing method, image processing apparatus, storage medium, and electronic device
CN109271929B (en) Detection method and device
CN112132847A (en) Model training method, image segmentation method, device, electronic device and medium
CN113887547B (en) Key point detection method and device and electronic equipment
CN109767453A (en) Information processing unit, background image update method and non-transient computer readable storage medium
US20200401811A1 (en) Systems and methods for target identification in video
CN112149615A (en) Face living body detection method, device, medium and electronic equipment
CN112101109B (en) Training method and device for face key point detection model, electronic equipment and medium
CN110856014B (en) Moving image generation method, moving image generation device, electronic device, and storage medium
CN110633630B (en) Behavior identification method and device and terminal equipment
CN112732553A (en) Image testing method and device, electronic equipment and storage medium
CN110210306B (en) Face tracking method and camera

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant