CN110276277A

CN110276277A - Method and apparatus for detecting facial image

Info

Publication number: CN110276277A
Application number: CN201910475881.5A
Authority: CN
Inventors: 连桄雷; 张龙
Original assignee: Ropter Technology Group Co Ltd
Current assignee: Ropter Technology Group Co Ltd
Priority date: 2019-06-03
Filing date: 2019-06-03
Publication date: 2019-09-24
Also published as: WO2020244032A1

Abstract

The embodiment of the present application discloses the method and apparatus for detecting facial image.One specific embodiment of this method includes: to obtain target image frame sequence；Face location information is obtained by picture frame input Face datection model trained in advance for each picture frame that target image frame sequence includes；Based on obtained face location information, from the picture frame that target image frame sequence includes, at least one human face image sequence is determined, wherein the facial image that each human face image sequence includes is used to indicate the same face；For each of at least one human face image sequence face image sequence, the quality score for each facial image that the human face image sequence includes is determined；Based on obtained quality score, facial image and output are extracted from the human face image sequence.The embodiment realizes the facial image that high quality is extracted from target image sequence, is conducive to improve the accuracy for carrying out the operations such as recognition of face using the facial image extracted.

Description

Method and apparatus for detecting facial image

Technical field

The invention relates to field of computer technology, and in particular to the method and apparatus for detecting facial image.

Background technique

Video surveillance network has covered the major small and medium-sized cities of China at present, and face recognition technology can be applied in safety monitoring Field.In general, must just be disposed in front end enough to establish the soft or hard integrated novel intelligent security protection system from cloud to front end Face snap equipment abundant.During monitoring point transformation and social resources access, purely captured by rear end, analysis Mode not only bring challenges to the data transmission capabilities of network, but also bring to the data-handling capacity of back-end platform very big Pressure, there is a problem of operational efficiency reduction, operation cost it is big.In general, can face snap function sharing to front end, but It is the abruptly increase that large batch of replacement candid camera will cause project construction cost.Now with the arriving in 5G epoch, edge meter It can be regarded as the supplement for cloud computing, substitution solution can be served as, in this way, it is necessary to which a kind of gateway is able to achieve head end video Face snap, analyzed for rear end.

Summary of the invention

The purpose of the embodiment of the present application is to propose a kind of improved method and apparatus for detecting facial image, come Solve the technical issues of background section above is mentioned.

In a first aspect, the embodiment of the present application provides a kind of method for detecting facial image, this method comprises: obtaining Target image frame sequence；For each picture frame that target image frame sequence includes, by picture frame input people trained in advance Face detection model obtains face location information；Based on obtained face location information, the figure that includes from target image frame sequence As determining at least one human face image sequence, wherein the facial image that each human face image sequence includes is used to indicate together in frame One face；For each of at least one human face image sequence face image sequence, determine that the human face image sequence includes Each facial image quality score；Based on obtained quality score, facial image is extracted from the human face image sequence And output.

In some embodiments, obtained face location information, the picture frame for including from target image frame sequence are based on In, determine at least one human face image sequence, comprising: for the every two adjacent picture frame in target image frame sequence, really Characteristic point in each of first picture frame of fixed two adjacent picture frames face image, and determine the first picture frame Each of face image is corresponding, predicted characteristics point in the second picture frame；From the facial image in the second picture frame, The quantity for the predicted characteristics point that determination includes be more than or equal to default value facial image as with pair in the first picture frame The identical facial image of face for answering facial image to indicate.

In some embodiments, obtained face location information, the picture frame for including from target image frame sequence are based on In, determine at least one human face image sequence, comprising:, will for the every two adjacent picture frame in target image frame sequence In the facial image in the facial image and the second picture frame in the first picture frame in two adjacent picture frames, area The facial image that registration is more than or equal to preset registration threshold value is determined as indicating the facial image of identical face.

In some embodiments, Face datection model is also used to generate the key point information set of picture frame, wherein crucial Point information is for characterizing position of the face key point in facial image；And determine the human face image sequence include everyone The quality score of face image, comprising: the key point information set based on each facial image that the human face image sequence includes, really The human face posture angle information of fixed each facial image；Based on human face posture angle information, the quality score of each facial image is determined.

In some embodiments, the key point information set for each facial image for including based on the human face image sequence, Determine the human face posture angle information of each facial image, comprising: each facial image for including based on the human face image sequence Key point information set generates the corresponding key point feature vector of each facial image；By key point feature vector generated Multiplied by the eigenmatrix being fitted in advance, human face posture corner characteristics vector is obtained as human face posture angle information.

In some embodiments, it is based on human face posture angle information, determines the quality score of each facial image, comprising: base In the key point information set of each facial image, the clarity of each facial image is determined；Using human face posture angle information and Clarity determines the quality score of each facial image.

In some embodiments, based on the key point information set of each facial image, the clear of each facial image is determined Clear degree, comprising: target critical point information is extracted from the key point information set of each facial image；Believed based on target critical point Breath determines target area from each facial image, and determines the mean pixel gradient for the pixel that target area includes；Base In mean pixel gradient, the clarity of each facial image is determined.

In some embodiments, it is the convolutional layer that depth separates convolution that Face datection model, which includes structure,.

In some embodiments, Face datection model advances with batch standardized way training and obtains.

Second aspect, the embodiment of the present application provide it is a kind of for detecting the device of facial image, the device include: obtain Module, for obtaining target image frame sequence；Generation module, each picture frame for including for target image frame sequence, By picture frame input Face datection model trained in advance, face location information is obtained, wherein face location information is used for table Position of traveller on a long journey's face image in the picture frame；Determining module, for being based on obtained face location information, from target image In the picture frame that frame sequence includes, at least one human face image sequence is determined, wherein the face that each human face image sequence includes Image is used to indicate the same face；Output module, for for each of at least one human face image sequence face image Sequence determines the quality score for each facial image that the human face image sequence includes；Based on obtained quality score, from this Facial image and output are extracted in human face image sequence.

In some embodiments, determining module is further configured to: for the every two phase in target image frame sequence Adjacent picture frame, determines the characteristic point in each of the first picture frame of two adjacent picture frames face image, and Determine that each of the first picture frame face image is corresponding, predicted characteristics point in the second picture frame；From the second picture frame In facial image in, the quantity of the predicted characteristics point that determination includes be more than or equal to the facial image of default value as with the The identical facial image of face of correspondence facial image instruction in one picture frame.

In some embodiments, determining module is further configured to: for the every two phase in target image frame sequence Adjacent picture frame, by the face in the facial image and the second picture frame in the first picture frame in two adjacent picture frames In image, the facial image that area registration is more than or equal to preset registration threshold value is determined as indicating the face of identical face Image.

In some embodiments, Face datection model is also used to generate the key point information set of picture frame, wherein crucial Point information is for characterizing position of the face key point in facial image；And output module includes: the first determination unit, is used for Key point information set based on each facial image that the human face image sequence includes, determines the face appearance of each facial image State angle information；Second determination unit determines the quality score of each facial image for being based on human face posture angle information.

In some embodiments, the first determination unit includes: the first generation subelement, for being based on the human face image sequence Including each facial image key point information set, generate the corresponding key point feature vector of each facial image；Second Subelement is generated, for key point feature vector generated multiplied by the eigenmatrix being fitted in advance, to be obtained human face posture angle Feature vector is as human face posture angle information.

In some embodiments, the second determination unit includes: the first determining subelement, for based on each facial image Key point information set determines the clarity of each facial image；Second determines subelement, for utilizing human face posture angle information And clarity, determine the quality score of each facial image.

In some embodiments, first determine that subelement includes: extracting sub-module, for the key from each facial image Target critical point information is extracted in point information aggregate；First determines submodule, for being based on target critical point information, from everyone Target area is determined in face image, and determines the mean pixel gradient for the pixel that target area includes；Second determines submodule Block determines the clarity of each facial image for being based on mean pixel gradient.

In some embodiments, Face datection model advances with what batch standardized way training obtained.

The third aspect, the embodiment of the present application provide a kind of electronic equipment, including one or more processors；Storage dress It sets, for storing one or more programs, when one or more programs are executed by one or more processors, so that one or more A processor realizes the method as described in implementation any in first aspect.

Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, are stored thereon with computer journey Sequence realizes the method as described in implementation any in first aspect when the computer program is executed by processor.

Method and apparatus provided by the embodiments of the present application for detecting facial image, by from target image frame sequence Determine at least one human face image sequence, wherein each human face image sequence is used to indicate the same face, then from everyone The quality score that each facial image is determined in face image sequence extracts facial image and output according to quality score, thus real The facial image for extracting high quality from target image sequence is showed, has been conducive to improve the facial image progress people using extracting The accuracy of the operations such as face identification.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that this application can be applied to exemplary system architecture figures therein；

Fig. 2 is the flow chart according to one embodiment of the method for detecting facial image of the application；

Fig. 3 is the flow chart according to another embodiment of the method for detecting facial image of the application；

Fig. 4 is the illustrative diagram according to the human face posture angle of the method for detecting facial image of the application；

Fig. 5 is the structural schematic diagram according to one embodiment of the device for detecting facial image of the application；

Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present application.

Specific embodiment

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 is shown can be using the exemplary system architecture of the method for detecting facial image of the embodiment of the present application 100。

As shown in Figure 1, system architecture 100 may include terminal device 101, network 102, intermediate equipment 103 and server 104.Network 102 between terminal device 101, intermediate equipment 103 and server 104 to provide the medium of communication link.Net Network 102 may include various connection types, such as wired, wireless communication link or fiber optic cables etc..

Server 104 can be to provide the server of various services, such as to the image frame sequence that terminal device 101 uploads The image processing server handled.Image processing server can be handled received image frame sequence, and be obtained Processing result (such as facial image of high quality).

Intermediate equipment 103 can be the various equipment for being used for data transmit-receive and processing, including but not limited to following at least one Kind: interchanger, gateway etc..

It should be noted that for detecting the method for facial image generally by intermediate equipment provided by the embodiment of the present application 103 execute, and correspondingly, the device for detecting facial image is generally positioned in intermediate equipment 103.It should also be noted that, Method provided by the embodiment of the present application for detecting facial image can also be executed by terminal device 101 or server 104, Correspondingly, it can be set in terminal device 101 or server 104 for detecting the device of facial image.

It should be understood that the number of data server, network and primary server in Fig. 1 is only schematical.According to reality It now needs, can have any number of terminal device, network, intermediate equipment and server.

With continued reference to Fig. 2, it illustrates one embodiment for being applied to detect the method for facial image according to the application Process 200.Method includes the following steps:

Step 201, target image frame sequence is obtained.

In the present embodiment, for detect the method for facial image executing subject (such as intermediate equipment shown in FIG. 1 or Terminal device or server) available target image frame sequence.Wherein, target image frame sequence can be camera (such as Camera that above-mentioned executing subject includes or the camera for including with the electronic equipment of above-mentioned executing subject communication connection) to target The image frame sequence that the video of face (such as face of the personage in the coverage of i.e. above-mentioned camera) shooting includes.It is logical Often, target image frame can be the figure of the picture frame of camera current shooting and the preset time period shooting before current time The image frame sequence formed as frame.

Step 202, each picture frame for including for target image frame sequence, by picture frame input people trained in advance Face detection model obtains face location information.

In the present embodiment, each picture frame for including for target image frame sequence, above-mentioned executing subject can should Picture frame input Face datection model trained in advance, obtains face location information.Wherein, Face datection model is used for phenogram As the corresponding relationship of sequence and face location information.

As an example, Face datection model can be above-mentioned executing subject or other electronic equipments, machine learning side is utilized Method, the sample image frame sequence for including using the training sample in preset training sample set is as input, by the sample with input The corresponding sample position information of this image frame sequence is as desired output, to initial model (such as convolutional neural networks, circulation mind Through network etc.) it is trained, for the sample image frame sequence of each training input, available reality output.Wherein, practical Output is the data of initial model reality output, for characterizing the position of facial image.Then, above-mentioned executing subject can use Gradient descent method and back propagation are based on reality output and desired output, adjust the parameter of initial model, will adjust ginseng every time Initial model of the model obtained after number as training next time, and in the case where meeting preset trained termination condition, terminate Training, so that training obtains speech recognition modeling.The training termination condition here preset at can include but is not limited to it is following at least One: the training time is more than preset duration；Frequency of training is more than preset times；Utilize preset loss function (such as cross entropy Loss function) resulting penalty values are calculated less than default penalty values threshold value.

Above-mentioned initial model can be the various models for target detection, such as MTCNN (Multi-task Convolutional neural network, multitask convolutional neural networks), RetinaFace etc..

In some optional implementations of the present embodiment, Face datection model may include that structure is separable for depth The convolutional layer of convolution.Wherein, convolution (depthwise separable convolutions) structure is separated using depth Convolutional neural networks can reduce the occupied memory space of convolutional neural networks, and can reduce the meter of convolutional neural networks Calculation amount, to help to improve the efficiency for extracting facial image.It is using the convolutional neural networks that depth separates convolutional coding structure The well-known technique studied and applied extensively at present, details are not described herein.

In some optional implementations of the present embodiment, Face datection model, which can be, advances with batch standardization side Formula training obtains model.Wherein, standardization (Batch Normalization, BN) is criticized, batch is called and normalizes, be a kind of use In the technology for the performance and stability for improving artificial neural network.It, can be with training for promotion using batch standardized way training pattern Speed, convergence process are greatly speeded up, and in addition be can simplify and are adjusted ginseng process, and the precision of training effectiveness and model treatment data is improved.

Step 203, it is based on obtained face location information, from the picture frame that target image frame sequence includes, is determined At least one human face image sequence.

In the present embodiment, above-mentioned executing subject can be based on obtained face location information, from target image frame sequence In the picture frame that column include, at least one human face image sequence is determined.Wherein, the facial image that each human face image sequence includes It is used to indicate the same face.

In some optional implementations of the present embodiment, above-mentioned executing subject can be determined in accordance with the following steps at least One human face image sequence:

For the every two adjacent picture frame in target image frame sequence, following steps are executed:

Firstly, determine the characteristic point in each of the first picture frame of two adjacent picture frames face image, with And each of determining first picture frame face image is corresponding, predicted characteristics point in the second picture frame.Wherein, the first figure As frame is the picture frame before the second picture frame.Specifically, above-mentioned executing subject can be according to obtained face location Information determines then facial image determines the characteristic point of facial image using various methods from each picture frame.Such as it adopts Each face figure is extracted with SIFT (Scale-invariant feature transform, scale invariant feature conversion) algorithm The characteristic point of picture.Subsequently, above-mentioned executing subject can use various characteristic point prediction algorithms (such as train neural network, item Part random field etc.), determine that each facial image is corresponding, predicted characteristics point in the second picture frame.

In practice, the characteristic point and predicted characteristics point of facial image can be determined using optical flow method.Wherein, optical flow method is benefit It is deposited between two adjacent frames in the variation in time-domain and the correlation between consecutive frame to find with pixel in image sequence Corresponding relationship, to calculate a kind of method of the motion information of object between consecutive frame.The advantages of optical flow method, is it Without it is understood that scene information, so that it may accurately detection identification movement day cursor position.And light stream not only carries moving object The motion information of body, but also the abundant information in relation to scenery three-dimensional structure is carried, it can not know any of scene In the case where information, Moving Objects are detected.

Then, from the facial image in the second picture frame, the quantity for the predicted characteristics point that determination includes is more than or equal to pre- If the facial image of numerical value is as the identical facial image of face indicated with the corresponding facial image in the first picture frame.Tool Body, above-mentioned executing subject can determine facial image according to the corresponding face location information of the second picture frame, and determine every The quantity for the predicted characteristics point that a facial image includes.For a facial image in the second picture frame, if the face figure The quantity of predicted characteristics point as in is more than or equal to default value, and the predicted characteristics point in the facial image is based on the first figure As some Face image synthesis in frame, then the two facial images are determined as being used to indicate the face figure of the same face Picture.In general, predicted characteristics point has corresponding facial image mark (being used to indicate the facial image in the first image), when second When the quantity for the predicted characteristics point that some facial image in image includes is more than or equal to default value, by the people of the facial image Face image mark is set as facial image mark corresponding with predicted characteristics point.When some facial image in the second image includes The quantity of predicted characteristics point when being less than default value, for the facial image, new facial image mark is set.

For the every two adjacent picture frame in target image frame sequence, by first in two adjacent picture frames In the facial image in facial image and the second picture frame in picture frame, area registration (or friendship and ratio for rectangle IOU) it is determined as indicating the facial image of identical face more than or equal to the facial image of preset registration threshold value.

Step 204, for each of at least one human face image sequence face image sequence, the facial image sequence is determined The quality score for each facial image that column include；Based on obtained quality score, people is extracted from the human face image sequence Face image and output.

In the present embodiment, above-mentioned to hold for each of at least one above-mentioned human face image sequence face image sequence Row main body can determine the quality score for each facial image that the human face image sequence includes first.Then, based on acquired Quality score, facial image and output are extracted from the human face image sequence.Wherein, the quality score of facial image can be used In the quality of characterization facial image, that is, quality score is higher, indicates that the quality of facial image is higher.As an example, can be by matter The maximum facial image of amount scoring is exported as optimal facial image.

Above-mentioned executing subject can determine the quality score of facial image according to various methods.As an example, above-mentioned execution Main body can determine the clarity of facial image, and clarity is determined as quality score.Wherein, clarity can use existing Determine that the algorithm of image definition obtains.For example, determining that the algorithm of image definition can include but is not limited to following at least one Kind: pixel gradient function, gray variance function, gray variance multiplicative function etc..

Above-mentioned executing subject can extract the maximum facial image of clarity and export from human face image sequence.Alternatively, According to the sequence that clarity is descending, extracts preset quantity facial image and export.

In the present embodiment, above-mentioned executing subject can export the facial image of extraction in various manners, for example, can be with The facial image of extraction and the mark of the facial image are shown on the display that above-mentioned executing subject includes.Alternatively, will extract Facial image be sent to above-mentioned executing subject communication connection other electronic equipments.

The method provided by the above embodiment of the application, by determining at least one face figure from target image frame sequence As sequence, wherein each human face image sequence is used to indicate the same face, then determines from each human face image sequence every The quality score of a facial image extracts facial image and output according to quality score, to realize from target image sequence The middle facial image for extracting high quality is conducive to raising and the facial image extracted is utilized to carry out the accurate of the operations such as recognition of face Property.

With further reference to Fig. 3, it illustrates another implementations according to the method for detecting facial image of the application The process 300 of example.Method includes the following steps:

Step 301, target image frame sequence is obtained.

In the present embodiment, step 301 and the step 201 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.

Step 302, each picture frame for including for target image frame sequence, by picture frame input people trained in advance Face detection model obtains face location information.

In the present embodiment, Face datection model can determine the face location letter of the facial image in the picture frame of input Breath can be also used for the key point information set for generating the picture frame of input.In practice, Face datection model, which can be, to be based on The model of MTCNN (Multi-task convolutional neural network, multitask convolutional neural networks) training. The model includes multiple cascade submodels, and submodel may be respectively used for detection face location and determine key point information collection It closes.Wherein, the key point information in key point information set is for characterizing position of the face key point in facial image.It is logical Often, key point information may include coordinate of the face key point in picture frame.Face key point is to be used for table in facial image Levy the point of specific position (such as eyes, nose, mouth etc.).

Step 303, it is based on obtained face location information, from the picture frame that target image frame sequence includes, is determined At least one human face image sequence.

In the present embodiment, step 303 and the step 203 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.

Step 304, for each of at least one human face image sequence face image sequence, it is based on the facial image sequence The key point information set for each facial image that column include determines the human face posture angle information of each facial image；Based on people Face attitude angle information determines the quality score of each facial image.

In the present embodiment, for each of at least one above-mentioned human face image sequence face image sequence, for examining The executing subject (such as intermediate equipment shown in FIG. 1 or terminal device or server) for surveying the method for facial image can execute such as Lower step:

Step 1, the key point information set based on each facial image that the human face image sequence includes determine each The human face posture angle information of facial image.

Wherein, human face posture angle information can be used for characterizing face just facing towards the camera relative to shooting face Degree of deflection.Human face posture angle information may include three kinds of pitch angle (pitch), yaw angle (yaw), roll angle (roll) angles Degree, respectively represents and spins upside down, left and right overturning, the angle of plane internal rotation.As shown in figure 4, x-axis, y-axis, z-axis are rectangular co-ordinates Three axis of system.Wherein, z-axis can be the optical axis of target camera 401, and y-axis can be not occur what side turned on the head of people Under state, by the central point of the crown profile of people and with the straight line of horizontal plane.Pitch angle can rotate around x axis for face Angle, the angle that yaw angle can rotate around y-axis for face, roll angle can be the angle that face is rotated around z-axis.In Fig. 4 In rectangular coordinate system in, when the head rotation of people, determine using the origin of the rectangular coordinate system as endpoint and pass through people two The ray at the midpoint of the line of a eyeball central point, the ray can be determined as positive appearance with x-axis, y-axis, the angle of z-axis respectively State angle.

Above-mentioned executing subject can determine human face posture angle information according to various methods.For example, can use existing people Face pose estimation method is based on key point information set, determines human face posture angle information.Wherein, human face posture angular estimation side Method can include but is not limited to following at least one: the method based on model, based on apparent method, the method based on classification Deng.

In some optional implementations of the present embodiment, above-mentioned executing subject can determine each in accordance with the following steps The human face posture angle information of facial image:

Firstly, the key point information set for each facial image for including based on the human face image sequence, generates everyone The corresponding key point feature vector of face image.Wherein, the element in key point feature vector includes the seat of M face key point Mark.

As an example it is supposed that M is 5, for a facial image, the corresponding crucial point feature of the facial image can be generated Vector A is [x1, x2, x3, x4, x5, y1, y2, y3, y4, y5, b], wherein x1-x5 is the x-axis coordinate of 5 face key points, Y1-y5 be 5 face key points y-axis coordinate, b be preset bias term, for example, 0.Key point feature vector A is 1 × 11 Vector.

By key point feature vector generated multiplied by the eigenmatrix being fitted in advance, human face posture corner characteristics vector is obtained As human face posture angle information.

It continues the example presented above, it is assumed that the matrix that features described above matrix X is 11 × 3, by feature vector A multiplied by eigenmatrix X, The vector for obtaining 1 × 3 is that human face posture is angularly measured, and it includes pitch angle, yaw angle, roll angle that human face posture, which is angularly measured,.

Features described above matrix can be fitted to obtain as follows in advance:

Assuming that there is N number of sample key point feature vector, each sample key point feature vector be expressed as V=[x1, x2, x3, X4, x5, y1, y2, y3, y4, y5,1], wherein the numerical value 1 in vector is default bias term.By N number of sample key point feature to Amount group is combined into an eigenmatrix B, and wherein B is the matrix of N × 11.The corresponding sample of each sample key point feature vector N number of sample key point combination of eigenvectors is a N by this face attitude angle vector (including pitch angle, yaw angle, roll angle) × 3 Matrix C.Opening relationships formula B × X=C.Wherein B, C are known conditions, so solving above-mentioned relation using least square method Formula, available X.

Step 2 is based on human face posture angle information, determines the quality score of each facial image.

Specifically, it is right respectively to can use three angles that preset, human face posture angle information includes for above-mentioned executing subject The weight answered determines the quality score of each facial image.As an example, can determine that human face posture is believed according to following formula Breath:

Score1=0.2 × (15-abs (roll))+0.5 × (15-abs (yaw))+0.3 × (15-abs (pitch)) Formula (1)

Wherein, Score1 is the quality score of facial image, and pitch, yaw, roll are respectively pitch angle, yaw angle, cross Roll angle, 0.2,0.5,0.3 is respectively the corresponding weight of three angles, and abs () is the absolute value for taking the angle in bracket, and 15 are The angle threshold of setting, that is, when attitude angle is more than 15 degree, take negative value.

In some optional implementations of the present embodiment, above-mentioned executing subject can determine face in accordance with the following steps The quality score of image:

Firstly, the key point information set based on each facial image, determines the clarity of each facial image.Wherein, The algorithm that clarity can use existing determining image definition obtains.For example, determining that the algorithm of image definition can wrap Include but be not limited to following at least one: pixel gradient function, gray variance function, gray variance multiplicative function etc..In general, can be with Clarity is arranged in [0,1] section.

Then, using human face posture angle information and clarity, the quality score of each facial image is determined.Specifically, on Stating executing subject can use human face posture angle information, determine the first scoring score1 according to above-mentioned formula (1).Above-mentioned clarity For the second scoring score2, it is based on preset weight, determines the quality score of facial image.As an example, the matter of facial image Amount scoring can be determined according to following formula (2):

Score=0.6 × score1+0.4 × score2 formula (2)

Wherein, score is quality score, and 0.6 and 0.4 is preset weight.

It should be noted that it is above-mentioned first scoring and second score numerical intervals it is identical, such as be in together [0,1] or [0,100]。

In some optional implementations of the present embodiment, above-mentioned executing subject can determine each in accordance with the following steps The clarity of facial image:

Firstly, extracting target critical point information from the key point information set of each facial image.Wherein, target critical Point information can be the key point information of pre-set specific position for characterizing face.In general, target face key point Information can be the key point information of the eyes and mouth that are used to indicate people.

Then, it is based on target critical point information, target area is determined from each facial image, and determine target area Including pixel mean pixel gradient.Wherein, target area can be the face including the instruction of target critical point information and close The region of key point.For example, target area can be the minimum rectangle of the face key point including the instruction of target critical point information.

The method that above-mentioned executing subject can use the pixel gradient of existing determining pixel determines every in target area Identified pixel gradient is averaged by the pixel gradient of a pixel, obtains mean pixel gradient.

Finally, being based on mean pixel gradient, the clarity of each facial image is determined.Specifically, target area can be calculated Then the summation S of the average value of the horizontal gradient and vertical gradient of each pixel in domain calculates average gradient avg_g=S/ (w*h*255.0), wherein w and h is the width and height of target area.

In the prior art, in order to improve facial image detection accuracy, it usually needs use the biggish nerve net of depth Network detection image.If network is too deep, the time for extracting characteristics of image will be elongated, and entire inference speed is slack-off.It is above-mentioned optional Implementation the quality score of facial image is determined due to using the method that human face posture angle and image definition combine, Compared to the biggish neural network of depth, faster, the hardware resource of occupancy is less for image processing speed, therefore, the reality of the application Apply example each step and optional implementation, can with combined application facial image detection system front end (such as Fig. 1 institute The terminal device or intermediate equipment shown), alleviate the pressure of back-end server.

Step 3 is based on obtained quality score, and facial image and output are extracted from the human face image sequence.

Step 3 and the method for extracting facial image and output in the step 204 in Fig. 2 corresponding embodiment are essentially identical, Which is not described herein again.

From figure 3, it can be seen that the method for sending information compared with the corresponding embodiment of Fig. 2, in the present embodiment Process 300 highlight the step of quality score of each facial image is determined based on human face posture angle information.It is possible thereby to sharp With face attitude angle information, the accuracy of the quality score of determining facial image is further increased, helps to further increase and mentions The quality of the facial image of taking-up.

With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind for detecting people One embodiment of the device of face image, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which specifically may be used To be applied in various electronic equipments.

As shown in figure 5, the device 500 for detecting facial image of the present embodiment includes: to obtain module 501, for obtaining Take target image frame sequence；Generation module 502, each picture frame for including for target image frame sequence, by the image Frame input Face datection model trained in advance, obtains face location information, wherein face location information is for characterizing face figure As the position in the picture frame；Determining module 503, for being based on obtained face location information, from target image frame sequence In the picture frame that column include, at least one human face image sequence is determined, wherein the facial image that each human face image sequence includes It is used to indicate the same face；Output module 504, for for each of at least one human face image sequence face image sequence Column, determine the quality score for each facial image that the human face image sequence includes；Based on obtained quality score, from the people Facial image and output are extracted in face image sequence.

In the present embodiment, for detecting the available target image frame sequence of acquisition module 501 of the method for facial image Column.Wherein, target image frame sequence can be camera to target face (such as the people in the coverage of i.e. above-mentioned camera The face of object) shooting the video image frame sequence that includes.In general, target image frame can be the image of camera current shooting The image frame sequence of the picture frame composition of preset time period shooting before frame and current time.

In the present embodiment, each picture frame for including for target image frame sequence, above-mentioned generation module 502 can incite somebody to action Picture frame input Face datection model trained in advance, obtains face location information.Wherein, Face datection model is for characterizing The corresponding relationship of image sequence and face location information.

As an example, Face datection model can be above-mentioned apparatus 500 or other electronic equipments, machine learning side is utilized Method, the sample image frame sequence for including using the training sample in preset training sample set is as input, by the sample with input The corresponding sample position information of this image frame sequence is as desired output, to initial model (such as convolutional neural networks, circulation mind Through network etc.) it is trained, for the sample image frame sequence of each training input, available reality output.Wherein, practical Output is the data of initial model reality output, for characterizing the position of facial image.Then, for training face detection model Executing subject can use gradient descent method and back propagation, be based on reality output and desired output, adjust initial model Parameter, using the model obtained after each adjusting parameter as next time training initial model, and meeting it is preset training knot In the case where beam condition, terminate training, so that training obtains speech recognition modeling.The training termination condition here preset at can wrap Include but be not limited at least one of following: the training time is more than preset duration；Frequency of training is more than preset times；Utilize preset damage It loses function (such as cross entropy loss function) and calculates resulting penalty values less than default penalty values threshold value.

In the present embodiment, determining module 503 can be based on obtained face location information, in various manners from mesh In the picture frame that logo image frame sequence includes, at least one human face image sequence is determined.Wherein, each human face image sequence includes Facial image be used to indicate the same face.

In the present embodiment, above-mentioned defeated for each of at least one above-mentioned human face image sequence face image sequence Module 504 can determine the quality score for each facial image that the human face image sequence includes first out.Then, it is based on gained The quality score arrived extracts facial image and output from the human face image sequence.Wherein, the quality score of facial image can be with For characterizing the quality of facial image, that is, quality score is higher, indicates that the quality of facial image is higher.In general, can be by quality The maximum facial image that scores is exported as optimal facial image.

Above-mentioned output module 504 can determine the quality score of facial image according to various methods.As an example, above-mentioned defeated Module 504 can determine the clarity of facial image out, and clarity is determined as quality score.Wherein, clarity can use The algorithm of existing determining image definition obtains.For example, it is following to determine that the algorithm of image definition can include but is not limited to It is at least one: pixel gradient function, gray variance function, gray variance multiplicative function etc..

Above-mentioned output module 504 can extract the maximum facial image of clarity and export from human face image sequence.Or Person extracts preset quantity facial image and exports according to the sequence that clarity is descending.

In some optional implementations of the present embodiment, determining module 503 is further configured to: for target figure As the every two adjacent picture frame in frame sequence, each of the first picture frame of two adjacent picture frames face is determined Characteristic point in image, and determine that each of the first picture frame face image is corresponding, prediction in the second picture frame Characteristic point；From the facial image in the second picture frame, the quantity for the predicted characteristics point that determination includes is more than or equal to default value Facial image as in the first picture frame corresponding facial image instruction the identical facial image of face.

In some optional implementations of the present embodiment, determining module 503 is further configured to: for target figure As the every two adjacent picture frame in frame sequence, by the facial image in the first picture frame in two adjacent picture frames In the facial image in the second picture frame, the facial image that area registration is more than or equal to preset registration threshold value is determined For the facial image for indicating identical face.

In some optional implementations of the present embodiment, Face datection model is also used to generate the key point of picture frame Information aggregate, wherein key point information is for characterizing position of the face key point in facial image；And output module 504 It include: the first determination unit (not shown), the key of each facial image for including based on the human face image sequence Point information aggregate, determines the human face posture angle information of each facial image；Second determination unit (not shown), for being based on Human face posture angle information determines the quality score of each facial image.

In some optional implementations of the present embodiment, the first determination unit (not shown) includes: first raw At subelement (not shown), the key point information collection of each facial image for including based on the human face image sequence It closes, generates the corresponding key point feature vector of each facial image；Second generates subelement (not shown), for will give birth to At key point feature vector multiplied by the eigenmatrix being fitted in advance, obtain human face posture corner characteristics vector as human face posture angle Information.

In some optional implementations of the present embodiment, the second determination unit includes: the first determining subelement (in figure It is not shown), for the key point information set based on each facial image, determine the clarity of each facial image；Second really Stator unit (not shown) determines that the quality of each facial image is commented for utilizing human face posture angle information and clarity Point.

In some optional implementations of the present embodiment, first determine subelement include: extracting sub-module (in figure not Show), for extracting target critical point information from the key point information set of each facial image；First determines submodule (not shown) determines target area, and determine target for being based on target critical point information from each facial image The mean pixel gradient for the pixel that region includes；Second determines submodule (not shown), for based on mean pixel ladder Degree, determines the clarity of each facial image.

In some optional implementations of the present embodiment, Face datection model includes that structure is that depth separates convolution Convolutional layer.

In some optional implementations of the present embodiment, Face datection model advances with batch standardized way training It obtains.

The device provided by the above embodiment of the application, by determining at least one face figure from target image frame sequence As sequence, wherein each human face image sequence is used to indicate the same face, then determines from each human face image sequence every The quality score of a facial image extracts facial image and output according to quality score, to realize from target image sequence The middle facial image for extracting high quality is conducive to raising and the facial image extracted is utilized to carry out the accurate of the operations such as recognition of face Property.

Below with reference to Fig. 6, it illustrates the computer systems 600 for the electronic equipment for being suitable for being used to realize the embodiment of the present application Structural schematic diagram.Electronic equipment shown in Fig. 6 is only an example, function to the embodiment of the present application and should not use model Shroud carrys out any restrictions.

As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and Execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.

I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.；Including such as liquid crystal Show the output par, c 607 of device (LCD) etc. and loudspeaker etc.；Storage section 608 including hard disk etc.；And including such as LAN The communications portion 609 of the network interface card of card, modem etc..Communications portion 609 is executed via the network of such as internet Communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as disk, CD, magneto-optic Disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to from the computer program root read thereon According to needing to be mounted into storage section 608.

Particularly, according to an embodiment of the present application, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiments herein includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 609, and/or from detachable media 611 are mounted.When the computer program is executed by central processing unit (CPU) 601, limited in execution the present processes Above-mentioned function.

It should be noted that computer readable storage medium described herein can be computer-readable signal media or Person's computer readable storage medium either the two any combination.Computer readable storage medium for example can be --- But be not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above group It closes.The more specific example of computer readable storage medium can include but is not limited to: have being electrically connected for one or more conducting wires Connect, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed it is read-only Memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory Part or above-mentioned any appropriate combination.In this application, computer readable storage medium, which can be, any include or stores The tangible medium of program, the program can be commanded execution system, device or device use or in connection.And In the application, computer-readable signal media may include in a base band or the data as the propagation of carrier wave a part are believed Number, wherein carrying computer-readable program code.The data-signal of this propagation can take various forms, including but not It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer Any computer readable storage medium other than readable storage medium storing program for executing, which can send, propagate or Person's transmission is for by the use of instruction execution system, device or device or program in connection.It is computer-readable to deposit The program code for including on storage media can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF Etc. or above-mentioned any appropriate combination.

The calculating of the operation for executing the application can be write with one or more programming languages or combinations thereof Machine program code, described program design language include object oriented program language-such as Java, Smalltalk, C+ +, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package, Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part. In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN) Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service Provider is connected by internet).

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.

Being described in module involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described module also can be set in the processor, for example, can be described as: a kind of processor packet It includes and obtains module, generation module, determining module and output module.Wherein, the title of these modules not structure under certain conditions The restriction of the pairs of unit itself is also described as example, obtaining module " for obtaining the mould of target image frame sequence Block ".

As on the other hand, present invention also provides a kind of computer readable storage medium, the computer-readable storage mediums Matter can be included in electronic equipment described in above-described embodiment；It is also possible to individualism, and without the supplying electricity In sub- equipment.Above-mentioned computer readable storage medium carries one or more program, when said one or multiple programs When being executed by the electronic equipment, so that the electronic equipment: obtaining target image frame sequence；Include for target image frame sequence Picture frame input Face datection model trained in advance is obtained face location information by each picture frame；Based on obtained Face location information determines at least one human face image sequence from the picture frame that target image frame sequence includes, wherein every The facial image that a human face image sequence includes is used to indicate the same face；For every at least one human face image sequence A human face image sequence determines the quality score for each facial image that the human face image sequence includes；Based on obtained matter Amount scoring, extracts facial image and output from the human face image sequence.

Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as (but being not limited to) applied in features described above and the application has similar function Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of method for detecting facial image, which is characterized in that the described method includes:

Obtain target image frame sequence；

For each picture frame that the target image frame sequence includes, by picture frame input Face datection mould trained in advance Type obtains face location information；

At least one is determined from the picture frame that the target image frame sequence includes based on obtained face location information Human face image sequence, wherein the facial image that each human face image sequence includes is used to indicate the same face；

For each of at least one human face image sequence face image sequence, determine that the human face image sequence includes The quality score of each facial image；Based on obtained quality score, extracted from the human face image sequence facial image and Output.

2. the method according to claim 1, wherein described be based on obtained face location information, from described In the picture frame that target image frame sequence includes, at least one human face image sequence is determined, comprising:

For the every two adjacent picture frame in the target image frame sequence, the first of two adjacent picture frames is determined Characteristic point in each of picture frame face image, and determine each of the first picture frame face image it is corresponding, Predicted characteristics point in second picture frame；From the facial image in the second picture frame, the number for the predicted characteristics point that determination includes Amount is more than or equal to the facial image of default value as identical as the face of correspondence facial image instruction in the first picture frame Facial image.

3. the method according to claim 1, wherein described be based on obtained face location information, from described In the picture frame that target image frame sequence includes, at least one human face image sequence is determined, comprising:

For the every two adjacent picture frame in the target image frame sequence, by first in two adjacent picture frames In the facial image in facial image and the second picture frame in picture frame, area registration is more than or equal to preset registration The facial image of threshold value is determined as indicating the facial image of identical face.

4. the method according to claim 1, wherein the Face datection model is also used to generate the pass of picture frame Key point information aggregate, wherein key point information is for characterizing position of the face key point in facial image；And

The quality score for each facial image that the determination human face image sequence includes, comprising:

Key point information set based on each facial image that the human face image sequence includes, determines the people of each facial image Face attitude angle information；

Based on human face posture angle information, the quality score of each facial image is determined.

5. according to the method described in claim 4, it is characterized in that, each face for including based on the human face image sequence The key point information set of image determines the human face posture angle information of each facial image, comprising:

It is corresponding to generate each facial image for key point information set based on each facial image that the human face image sequence includes Key point feature vector；

By key point feature vector generated multiplied by the eigenmatrix being fitted in advance, human face posture corner characteristics vector conduct is obtained Human face posture angle information.

6. according to the method described in claim 4, it is characterized in that, it is described be based on human face posture angle information, determine each face The quality score of image, comprising:

Key point information set based on each facial image, determines the clarity of each facial image；

Using human face posture angle information and clarity, the quality score of each facial image is determined.

7. according to the method described in claim 6, it is characterized in that, the key point information collection based on each facial image It closes, determines the clarity of each facial image, comprising:

Target critical point information is extracted from the key point information set of each facial image；

Based on target critical point information, target area is determined from each facial image, and determines the picture that target area includes The mean pixel gradient of vegetarian refreshments；

Based on mean pixel gradient, the clarity of each facial image is determined.

8. method described in one of -7 according to claim 1, which is characterized in that the Face datection model includes that structure is depth The convolutional layer of separable convolution.

9. method described in one of -7 according to claim 1, which is characterized in that the Face datection model advances with batch standard The training of change mode obtains.

10. a kind of for detecting the device of facial image, which is characterized in that described device includes:

Module is obtained, for obtaining target image frame sequence；

The picture frame is inputted instruction in advance by generation module, each picture frame for including for the target image frame sequence Experienced Face datection model, obtains face location information, wherein face location information is for characterizing facial image in the picture frame In position；

Determining module, for being based on obtained face location information, from the picture frame that the target image frame sequence includes, Determine at least one human face image sequence, wherein the facial image that each human face image sequence includes is used to indicate the same person Face；

Output module, for determining the face for each of at least one human face image sequence face image sequence The quality score for each facial image that image sequence includes；Based on obtained quality score, from the human face image sequence Extract facial image and output.

11. a kind of electronic equipment, comprising:

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now method as described in any in claim 1-9.

12. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The method as described in any in claim 1-9 is realized when execution.