CN107808147B

CN107808147B - Face confidence discrimination method based on real-time face point tracking

Info

Publication number: CN107808147B
Application number: CN201711144871.0A
Authority: CN
Inventors: 关明鑫; 王喆; 许清泉; 洪炜冬; 张伟
Original assignee: Xiamen Meitu Technology Co Ltd
Current assignee: Xiamen Meitu Technology Co Ltd
Priority date: 2017-11-17
Filing date: 2017-11-17
Publication date: 2020-11-27
Anticipated expiration: 2037-11-17
Also published as: CN107808147A

Abstract

The invention discloses a face confidence coefficient discrimination method based on real-time face point tracking, which is suitable for being executed in computing equipment and comprises the following steps: carrying out face detection processing on an image to be tracked and generating a corresponding face image; generating an image of each face characteristic point according to the coordinates of each face characteristic point in the face image; inputting the image of each facial feature point into a preset convolution network of the facial feature point, and respectively calculating the confidence coefficient of each facial feature point; forming new features of the face image according to the confidence degrees of the face feature points of each person; inputting the new features of the face image into a preset global linear regression to obtain the confidence coefficient of the face image; and judging whether the tracking result of the face points is accurate or not according to the confidence coefficient of the face image. The invention also discloses a computing device for executing the method.

Description

Face confidence discrimination method based on real-time face point tracking

Technical Field

The invention relates to the technical field of image processing, in particular to a face confidence degree judging method and computing equipment based on real-time face point tracking.

Background

With the wide application of the face tracking algorithm, some means are often needed to judge the accuracy of the face point tracking algorithm. For example, in a real-time face point tracking system based on video streaming, due to the complex environment, phenomena of mis-tracking and mis-recognition often occur. Under the circumstance, a human face point tracking system is expected to quickly and accurately judge whether the tracked or recognized human face is accurate, if the human face tracked by the tracking system is judged to be accurate, the tracking is continued, otherwise, the tracking system is initialized, and the human face detection is restarted in the next frame. A common method for judging the face tracking accuracy is to calculate the confidence of the tracked face, and if the confidence is very low, the face tracking accuracy is low. In the above scenario, if the face confidence coefficient discrimination algorithm cannot rapidly and accurately discriminate whether the tracking result is accurate, the face point tracking effect may be poor or even the system tracking may be disabled.

The existing face confidence discrimination algorithm generally uses a face point result of a face point tracking system, inputs an image area containing face points into a convolutional neural network model to obtain characteristics of the image area, and judges whether the result of the face point tracking system of the current frame is accurate according to the characteristics. However, in general, an input image is large, which results in a large calculation amount of a convolutional neural network, so that the algorithm speed is slow, and particularly when a plurality of faces appear in a tracking system, the system is obviously stuck and cannot meet real-time requirements. On the other hand, in the case that the face organ is occluded in the face image, the above method cannot accurately determine the confidence of face tracking.

In view of the above drawbacks, a confidence coefficient determination method is needed, which improves the calculation efficiency and has good robustness so as to optimize the face point tracking method based on confidence coefficient determination.

Disclosure of Invention

Therefore, the invention provides a face confidence coefficient judging method based on real-time face point tracking, which is suitable for being executed in computing equipment and comprises the following steps: carrying out face detection processing on an image to be tracked and generating a corresponding face image; generating an image of each face characteristic point according to the coordinates of each face characteristic point in the face image; inputting the image of each facial feature point into a preset convolution network of the facial feature point, and respectively calculating the confidence coefficient of each facial feature point; forming new features of the face image according to the confidence degrees of the face feature points of each person; inputting the new features of the face image into a preset global linear regression to obtain the confidence coefficient of the face image; and judging whether the face point tracking result is accurate or not according to the confidence coefficient of the face image.

Optionally, in the method for discriminating a face confidence based on real-time face point tracking according to the present invention, the step of discriminating whether the result of face point tracking is accurate according to the confidence of the face image includes: if the confidence coefficient of the face image is greater than 0, judging that the face point tracking result is accurate; otherwise, the face point tracking result is judged to be inaccurate.

Optionally, in the method for discriminating a face confidence based on real-time face point tracking according to the present invention, the step of performing face detection processing on an image to be tracked and generating a corresponding face image includes: detecting a face area in an image to be tracked through a face detection algorithm; cutting out an image containing a face area from an image to be tracked as an initial face image; and carrying out scaling processing on the initial face image to generate a face image with a first preset size.

Optionally, in the face confidence coefficient discriminating method based on real-time face point tracking according to the present invention, the step of generating an image of each face feature point from coordinates of each face feature point in the face image includes: and selecting the image with the second preset size as the image of each facial feature point by taking the coordinates of each facial feature point in the facial image as the center.

Optionally, in the method for discriminating human face confidence based on real-time human face point tracking according to the present invention, the method further includes a step of training a preset convolution network for each human face feature point, where the preset convolution network includes four convolution layers and a full-connected layer formed by a support vector machine, the sizes of convolution kernels of the four convolution layers are sequentially 3 × 3, 3 × 3, 2 × 2, and the number of convolution kernels corresponding to each layer is 8, 16, 32, and 64.

Optionally, in the method for discriminating human face confidence based on real-time human face point tracking according to the present invention, the step of training a preset convolutional network of each human face feature point includes: respectively preprocessing the positive and negative training samples to generate a first training image containing a face image and a second training image not containing the face image; generating a first input image of each face characteristic point by taking the coordinates of each face characteristic point in the first training image as a center; generating a second input image corresponding to each human face characteristic point by taking the coordinates of each human face characteristic point in the first training image as a reference and taking the corresponding coordinates of the second training image as a center; and inputting the corresponding first input image and second input image into a preset convolution network aiming at each human face characteristic point so as to train a support vector machine to obtain a final preset convolution network.

Optionally, in the face confidence coefficient discriminating method based on real-time face point tracking according to the present invention, the positive training sample is an image containing a face, and the negative training sample is an image not containing a face.

Optionally, in the method for discriminating a face confidence based on real-time face point tracking according to the present invention, the preprocessing step includes: cutting out a first training image with a first preset size according to a face area marked in a positive training sample; and cutting out a second training image with a corresponding first preset size in the negative training sample by taking the face area in the positive training sample as a reference.

Optionally, in the face confidence coefficient discriminating method based on real-time face point tracking according to the present invention, the sizes of the first input image and the second input image are both a second predetermined size.

Optionally, in the face confidence coefficient discriminating method based on real-time face point tracking according to the present invention, the kernel function of the support vector machine is a linear kernel function.

Optionally, in the face confidence coefficient discriminating method based on real-time face point tracking according to the present invention, the method further includes a step of training a preset global linear regressor, where the preset global linear regressor is composed of a fully-connected neural network, and the fully-connected neural network includes three layers.

Optionally, in the method for discriminating a human face confidence based on real-time human face point tracking according to the present invention, the step of training the preset global linear regressor includes: forming a global vector of the face features according to the confidence degrees of all face feature points in the face image output by the preset convolutional network; respectively processing the confidence coefficient of each human face characteristic point according to a preset rule to obtain a plurality of intervention vectors of human face characteristics; and inputting the global vector and the intervention vectors into a preset global linear regressor, and training through a gradient descent algorithm to generate a final preset global linear regressor.

Optionally, in the method for discriminating a face confidence based on real-time face point tracking according to the present invention, the step of processing the confidence of each face feature point according to a predetermined rule to obtain a plurality of intervention vectors about the face features includes: obtaining an intervention value corresponding to each face characteristic point by randomly reducing the confidence coefficient of each face characteristic point; and respectively combining the intervention value of each human face characteristic point and the confidence degrees of other human face characteristic points into an intervention vector related to the human face characteristic point.

Optionally, in the face confidence coefficient discriminating method based on real-time face point tracking according to the present invention, the first predetermined size is 128 × 128.

Optionally, in the face confidence coefficient discriminating method based on real-time face point tracking according to the present invention, the second predetermined size is 11 × 11.

According to yet another aspect of the present invention, there is provided a computing device comprising: one or more processors; and a memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods described above.

According to a further aspect of the invention there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods described above.

According to the scheme of the invention, the small graphs are cut out around each facial feature point and input into the preset convolution network to extract the abstract translation invariance feature, so that the calculated amount during feature extraction is greatly reduced, and the execution speed of the algorithm is accelerated; meanwhile, a support vector machine is used as a regressor, so that the defect of poor generalization capability of the convolutional neural network is overcome.

Furthermore, the output result of the preset convolution network of each human face characteristic point is manually intervened, an artificial sample close to the real shielding condition is manufactured, a global linear regression is trained, the robustness of shielding and identifying part of facial organs in the human face point tracking process is improved, and especially in some real-time human face point tracking systems, the scheme can well judge the human face tracking effect.

Drawings

To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.

FIG. 1 shows a schematic diagram of a configuration of a computing device 100 according to one embodiment of the invention;

FIG. 2 illustrates a flow diagram of a face confidence discrimination method 200 based on real-time face point tracking according to an embodiment of the present invention;

FIG. 3 illustrates a face image 310 and an image 320 of one of the face feature points according to one embodiment of the invention;

FIG. 4 is a diagram illustrating a preset convolutional network 400 according to one embodiment of the present invention; and

fig. 5 shows a schematic structural diagram of a fully-connected neural network 500 according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a block diagram of an example computing device 100. In a basic configuration 102, computing device 100 typically includes system memory 106 and one or more processors 104. A memory bus 108 may be used for communication between the processor 104 and the system memory 106.

Depending on the desired configuration, the processor 104 may be any type of processor, including but not limited to: a microprocessor (μ P), a microcontroller (μ C), a Digital Signal Processor (DSP), or any combination thereof. The processor 104 may include one or more levels of cache, such as a level one cache 110 and a level two cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations the memory controller 118 may be an internal part of the processor 104.

Depending on the desired configuration, system memory 106 may be any type of memory, including but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 106 may include an operating system 120, one or more applications 122, and program data 124. In some embodiments, application 122 may be arranged to operate with program data 124 on an operating system. In some embodiments, computing device 100 is configured to perform a face confidence discrimination method based on real-time face point tracking, and program data 124 includes instructions for performing the method.

Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to the basic configuration 102 via the bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, image input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communications with one or more other computing devices 162 over a network communication link via one or more communication ports 164. In this embodiment, the image to be tracked may be obtained in real time (or a video image to be tracked may be obtained in real time) through an image input device such as a camera, and of course, the image to be tracked may also be obtained through the communication device 146.

A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, Radio Frequency (RF), microwave, Infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media. In some embodiments, one or more programs are stored in the computer readable medium, the one or more programs including instructions for performing certain methods, such as a face confidence discrimination method based on real-time face point tracking performed by the computing device 100 according to embodiments of the present invention.

Computing device 100 may be implemented as part of a small-form factor portable (or mobile) electronic device such as a cellular telephone, a digital camera, a Personal Digital Assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. Computing device 100 may also be implemented as a personal computer including both desktop and notebook computer configurations.

The following describes in detail the flow of the face confidence discrimination method 200 based on real-time face point tracking according to an embodiment of the present invention with reference to fig. 2. In summary, the method 200 calculates the confidence of each facial feature point in the detected face image, calculates the confidence of the whole face image according to the confidence of each facial feature point, and determines whether the result of face point tracking is accurate according to the confidence of the face image.

As shown in fig. 2, the method 200 starts with step S210, and performs face detection processing on an image to be tracked and generates a corresponding face image. The specific execution steps can be divided into two steps:

firstly, a face region in an image to be tracked is detected through a face detection algorithm.

It should be noted that there are many face detection algorithms, and the specific algorithm for face detection is not limited in this scheme. For example, a simple Face detection technology (Face detection) is to determine whether a frame of image contains a Face region, return information such as size and position of a Face if the frame of image contains the Face, and determine the Face region in the image to be tracked according to the returned information. The specific algorithm may be a geometric feature-based method, or may be a template-or model-based method, such as a template matching model, a skin color model, an ANN model, an SVM model, an Adaboost model, or the like. In the scheme, any face detection method can be combined with the embodiment of the scheme to realize a face confidence degree judging method.

Secondly, cutting out an image containing a face area from the image to be tracked as an initial face image, and carrying out scaling processing on the initial face image to generate a face image with a first preset size.

The original face image is cut out according to the detected face area (position and size), and the size of the cut-out original face image in different images is different, so that the original face image needs to be further zoomed to unify the size of the face image. Optionally, the first predetermined size is 128 x 128.

Subsequently, in step S220, an image of each of the face feature points is generated from the coordinates of each of the face feature points in the face image.

On the face image after the clipping and scaling processing, all face characteristic points (eyes, eyebrows, nose, mouth, face outline and the like) are further determined, and an image with a second preset size is selected as an image of each face characteristic point by taking the coordinates of each face characteristic point as the center. Optionally, the second predetermined size is 11 × 11. As shown in fig. 3, human face feature points (for convenience of subsequent processing, the human face feature points are marked with numbers) are marked in the human face image 310, and an image 320 (a part as outlined in fig. 3) with a size of 11 × 11 as a center of the nose bridge is cropped by taking the center of the nose bridge as an example.

According to one embodiment of the invention, the face feature points comprise 106 feature points, wherein 73 points are used for representing face five sense organ feature points, and 33 points are used for representing face contour feature points. That is, in the embodiment of the present invention, for a face image, a total of 106 images of a second predetermined size are generated, which respectively represent each of the face feature points.

Subsequently, in step S230, the image of each facial feature point is input into the preset convolution network of the facial feature point, and the confidence level of each facial feature point is calculated respectively. As described above, in one embodiment according to the present invention, there are 106 predetermined convolutional networks for calculating the confidence of each facial feature point output. The confidence coefficient represents the accuracy of the face characteristic point tracking, and the higher the confidence coefficient is, the higher the possibility that the face characteristic point tracking is accurate is represented.

According to the implementation mode of the invention, the preset convolution network comprises four convolution layers in the convolution neural network and a full-connection layer formed by a support vector machine, wherein the sizes of convolution kernels of the four convolution layers are 3 × 3, 3 × 3, 2 × 2 and 2 × 2 in sequence, the number of the convolution kernels of each layer is 8, 16, 32 and 64, and the input of the support vector machine SVM is 1024 dimensions. Fig. 4 shows a schematic structural diagram of a preset convolution network, which inputs an image of each human face feature point, and is a three-channel color image.

Specifically, the method 200 further includes a step of training a predetermined convolutional network for each human face feature point, including the following steps (1) to (4):

(1) and respectively preprocessing the positive training sample and the negative training sample to generate a first training image containing the face image and a second training image not containing the face image. According to the embodiment of the invention, the positive training sample is an image which contains a human face and has clear facial five sense organs, the negative training sample is an image which does not contain the human face and is a real life scene, and the human face area is marked in the positive training sample in a manual or calculation mode.

Also, the step of performing the pre-treatment comprises: cutting out a first training image with a first preset size (for example, 128 multiplied by 128) according to the marked face area in the positive training sample; and cutting out a second training image with a corresponding first preset size in the negative training sample by taking the face area in the positive training sample as a reference.

(2) A first input image of each face feature point is generated centering on coordinates of each face feature point in a first training image. Optionally, the size of the first input image is a second predetermined size (e.g., 11 × 11).

(3) And generating a second input image corresponding to each human face characteristic point by taking the coordinates of each human face characteristic point in the first training image as a reference and taking the corresponding coordinates of the second training image as a center. Optionally, the size of the second input image is a second predetermined size (e.g., 11 × 11).

(4) And aiming at each facial feature point, inputting a first input image and a second input image corresponding to each facial feature point into a preset convolution network of the facial feature point, replacing a full connection layer of a trained convolution neural network with a support vector machine as a regressor of the confidence coefficient of the single facial feature point, and training the support vector machine to obtain a final preset convolution network.

In the embodiment according to the present invention, a support Vector machine is used to complete the regression problem, which is referred to as support Vector regression svr (support Vector regression) in some descriptions, and can be simply stated as: given a training sample D { (x)₁,y₁),(x₂,y₂),...,(x_m,y_m) Therein of

It is desirable to learn a regression model that forms the equation such that f (x) is as close as possible to y, and w and b are the model parameters to be determined.

f(x)＝w^Tx+b

For sample (x, y), a conventional regression model typically calculates the loss directly based on the difference between the model output f (x) and the true output y, and the loss is zero if and only if f (x) is identical to y. In contrast, SVR assumes that the most deviations between f (x) and y can be tolerated. In general, SVR can be expressed as:

in the above formula, the first and second carbon atoms are,

and alpha_iIs the introduced lagrange multiplier and is,

α_im represents the number of samples, { x ≧ 0_iIs the input feature vector (i.e., the trained feature vector), k (x, x)_i) For the kernel function, b is a real number. Common kernel functions include linear kernel functions, polynomial kernel functions, gaussian kernel functions, laplacian kernel functions, Sigmoid kernel functions, and the like. In view of the many applications of support vector regression in the field, there are many related descriptions, such as "machine learning", Zhou Shi, Qing Hua university Press, 2016 ", which is not further described hereinThe description is set forth.

The confidence coefficient of each face point tracking in the face image can be obtained through the steps, and the accuracy of the face point tracking can be judged according to the confidence coefficient under general conditions. However, in practical applications, there are cases where part of facial organs is occluded in face tracking, and in such cases, the confidence of the facial feature points of the occluded part is low, but in practice, it is desirable that the face confidence discrimination algorithm considers the tracking to be accurate. For the above reasons, the scheme further optimizes the calculated confidence of each human face characteristic point.

In the next step S240, a new feature of the face image is composed according to the confidence of each face feature point. According to one embodiment of the invention, the output results of the preset convolution networks of all human face characteristic points in the human face image are directly spliced and combined into the new characteristics of the human face image. That is to say, the output results of 106 preset convolutional networks are directly spliced into a vector of 106 dimensions as the new feature of the face image.

Subsequently, in step S250, the new features of the face image are input into a preset global linear regressor to obtain the confidence of the face image. According to an implementation of the present invention, the preset global linear regressor is composed of a fully-connected neural network, wherein the fully-connected neural network includes three layers, as shown in fig. 5, a column corresponding to 510 represents an input layer (layer L) of the fully-connected neural network 500₁) And 520 represents the middle or hidden layer (layer L) of the fully-connected neural network 500₂) 530 represents the output layer (layer L) of the fully-connected neural network 500₃). Wherein, the input layer 510 is composed of 107 neurons, and the 107 neurons represent the 106-dimensional new feature and the 1-dimensional bias obtained in step S240, respectively, as shown in fig. 5, the neuron 512 represents a new feature obtained in step S240, and the neuron 514 represents bias. The middle or hidden layer 520 is also 107 neurons and the final output layer 530 is 1 neuron.

According to another embodiment of the present invention, the method 200 further includes a step of training a preset global linear regression, including the following steps (a) to (c):

(a) and forming a global vector of the face features according to the confidence degrees of all the face feature points in the face image output by the preset convolutional network. This step can be directly spliced into a vector of 106 dimensions by referring to the related description of step S240.

(b) And respectively processing the confidence degrees of the characteristic points of each human face according to a preset rule to obtain a plurality of intervention vectors corresponding to the human face characteristics in the human face image. According to one implementation of the present invention, it is considered that the lower the confidence of a person's face feature point, the higher the likelihood that the representative face feature point is occluded. Therefore, by artificially reducing the confidence of a certain face feature point in the training image (particularly, the first training image), an effect that the face feature point is occluded can be produced.

Specifically, for a face image, the confidence of each face feature point in the face image is randomly reduced to obtain an intervention value corresponding to each face feature point, and the intervention value of each face feature point and the confidence of other face feature points are respectively combined to form an intervention vector about the face feature point. In other words, the confidences of 106 individual face feature points are respectively reduced to obtain 106 intervention values, one of the intervention values of the 106 individual face feature points is sequentially taken out, and the intervention vector of the face image is formed by splicing the extracted intervention vector with the confidences of the other 105 individual face feature points, and a plurality of intervention vectors are obtained through different combinations.

Depending on the desired effect, 106 classes of face feature points, such as left-eye class, right-eye class, nose class, lips class, etc., may also be classified, and random reduction processing is performed on each class of face feature points to generate an intervention value; it is also possible to perform reduction processing to different degrees on the same face feature point to obtain a plurality of intervention values regarding the face feature point. The invention is not limited in this regard and embodiments of the invention are directed to producing an artificial specimen that approximates true occlusion.

(c) And inputting the global vector and the intervention vectors into a preset global linear regressor, and adjusting connection weights among neurons in the fully-connected neural network through a gradient descent algorithm so as to train and generate the final preset global linear regressor. The gradient descent algorithm is a common method for training a fully-connected neural network, and is shown as the following formula:

where, theta is the input vector,

representing a derivative of theta and alpha is a real number. In the implementation, the gradient descent method is performed according to the following procedure: 1) firstly, assigning a value to theta, wherein the value can be random, and theta can also be an all-zero vector; 2) the value of θ is changed so that the objective function J (θ) decreases in the direction of gradient descent. Since the gradient descent algorithm belongs to the existing algorithm, it will not be described herein.

Finally, in step S260, whether the result of face point tracking is accurate is determined according to the confidence of the face image.

According to one embodiment of the invention, if the confidence of the face image is greater than 0, the face point tracking result is judged to be accurate; otherwise, the face point tracking result is judged to be inaccurate.

According to the face confidence degree judging scheme based on real-time face point tracking, small pictures are cut out around each face feature point and input into a preset convolution network to extract abstract translation invariance features, so that the calculation amount during feature extraction is greatly reduced, the execution speed of an algorithm is accelerated, and the small picture size of each face feature point is 11 x 11 by taking a high-definition 1920 x 1080 picture as an example; meanwhile, a support vector machine is used as a regressor, so that the defect of poor generalization capability of the convolutional neural network is overcome.

Furthermore, the output result of the preset convolution network of each human face characteristic point is manually intervened, an artificial sample close to the real shielding condition is manufactured, a global linear regression is trained, the robustness of shielding and identifying part of face organs in the human face tracking process is improved, and especially in some real-time human face point tracking systems, the scheme can well judge the human face tracking effect.

It should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The invention also discloses:

a9, the method of any one of A6-8, wherein the size of the first input image and the second input image are both a second predetermined size.

A10, the method as in any one of A5-9, wherein the kernel function of the support vector machine is a linear kernel function.

A11, the method according to any one of A1-10, further comprising the step of training a pre-set global linear regressor, wherein the pre-set global linear regressor is composed of a fully-connected neural network, and the fully-connected neural network comprises three layers.

A12, the method of a11, wherein the step of training the pre-set global linear regressor comprises: forming a global vector of the face features according to the confidence degrees of all face feature points in the face image output by the preset convolutional network; processing the confidence coefficient of each human face characteristic point according to a preset rule to obtain a plurality of intervention vectors of human face characteristics; and inputting the global vector and the intervention vectors into a preset global linear regressor, and training through a gradient descent algorithm to generate a final preset global linear regressor.

A13, the method as in a12, wherein the step of processing the confidence of each face feature point according to a predetermined rule to obtain a plurality of intervention vectors about the face features comprises: obtaining an intervention value corresponding to each face characteristic point by randomly reducing the confidence coefficient of each face characteristic point; and respectively combining the intervention value of each human face characteristic point and the confidence degrees of other human face characteristic points into an intervention vector related to the human face characteristic point.

A14, the method of any one of A3-13, wherein the first predetermined size is 128 x 128.

A15, the method of any one of A4-14, wherein the second predetermined size is 11 x 11.

The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.

In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to perform the method of the present invention according to instructions in the program code stored in the memory.

By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer-readable media includes both computer storage media and communication media. Computer storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of computer readable media.

Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.

As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims

1. A face confidence discrimination method based on real-time face point tracking, the method being adapted to be executed in a computing device, the method comprising the steps of:

carrying out face detection processing on an image to be tracked and generating a corresponding face image;

generating an image of each facial feature point according to the coordinates of each facial feature point in the facial image;

inputting the image of each face characteristic point into a preset convolution network of the face characteristic point, and respectively calculating the confidence coefficient of each face characteristic point;

forming new features of the face image according to the confidence degrees of the face feature points of each person;

inputting the new features of the face image into a preset global linear regressor to obtain the confidence coefficient of the face image, wherein the method further comprises the following steps of: forming a global vector of the face features according to the confidence degrees of all face feature points in the face image output by the preset convolutional network; processing the confidence degrees of the face characteristic points of each person according to a preset rule to obtain intervention vectors of a plurality of face characteristics so as to produce the effect that the face characteristic points of each person are shielded; inputting the global vector and the intervention vectors into a preset global linear regressor, and training through a gradient descent algorithm to generate a final preset global linear regressor; and

judging whether the tracking result of the face points is accurate or not according to the confidence coefficient of the face image;

wherein the step of generating an image of each facial feature point from the coordinates of each facial feature point in the facial image comprises:

and selecting the image with the second preset size as the image of each facial feature point by taking the coordinates of each facial feature point in the facial image as the center.

2. The method of claim 1, wherein the step of judging whether the result of the face point tracking is accurate according to the confidence of the face image comprises the following steps:

if the confidence coefficient of the face image is greater than 0, judging that the face point tracking result is accurate; otherwise, the result of judging the face point tracking is not accurate.

3. The method of claim 1, wherein the step of performing face detection processing on the image to be tracked and generating a corresponding face image comprises:

detecting a face area in an image to be tracked through a face detection algorithm;

cutting out an image containing a face area from an image to be tracked as an initial face image; and

and carrying out scaling processing on the initial face image to generate a face image with a first preset size.

4. The method of any one of claims 1-3, further comprising the step of training a predetermined convolutional network for each human face feature point, wherein the predetermined convolutional network comprises four convolutional layers and a fully-connected layer formed by a support vector machine, the convolutional kernels of the four convolutional layers have the sizes of 3 x 3, 2 x 2 and 2 x 2 in sequence, and the number of convolutional kernels per layer is 8, 16, 32 and 64.

5. The method of claim 4, wherein the step of training a predetermined convolutional network for each human face feature point comprises:

respectively preprocessing the positive training sample and the negative training sample to generate a first training image containing a face image and a second training image not containing the face image;

generating a first input image of each face characteristic point by taking the coordinates of each face characteristic point in the first training image as a center;

generating a second input image corresponding to each human face characteristic point by taking the coordinates of each human face characteristic point in the first training image as a reference and taking the corresponding coordinates of the second training image as a center; and

and inputting the corresponding first input image and second input image into a preset convolution network aiming at each human face characteristic point so as to train the support vector machine to obtain a final preset convolution network.

6. The method of claim 5, wherein the positive training samples are images containing faces and the negative training samples are images not containing faces.

7. The method of claim 6, wherein the preprocessing step comprises:

cutting out a first training image with a first preset size according to a face area marked in a positive training sample; and

and cutting out a second training image with a corresponding first preset size in the negative training sample by taking the face area in the positive training sample as a reference.

8. The method of any of claims 5-7, wherein the first input image and the second input image are each a second predetermined size.

9. The method of claim 4, wherein the kernel function of the support vector machine is a linear kernel function.

10. The method of any one of claims 1-3, further comprising the step of training a pre-set global linear regressor, wherein the pre-set global linear regressor is comprised of a fully-connected neural network, the fully-connected neural network comprising three layers.

11. The method of claim 1, wherein the step of processing the confidence of each facial feature point according to a predetermined rule to obtain a plurality of intervention vectors related to the facial features comprises:

obtaining an intervention value corresponding to each face characteristic point by randomly reducing the confidence coefficient of each face characteristic point; and

and respectively combining the intervention value of each human face characteristic point and the confidence degrees of other human face characteristic points into an intervention vector about the human face characteristic point.

12. The method of claim 3, wherein the first predetermined size is 128 x 128.

13. The method of any of claims 1-3, wherein the second predetermined size is 11 x 11.

14. A computing device, comprising:

one or more processors; and

a memory;

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-13.

15. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-13.