WO2022247118A1

WO2022247118A1 - Pushing method, pushing apparatus and electronic device

Info

Publication number: WO2022247118A1
Application number: PCT/CN2021/125407
Authority: WO
Inventors: 曾钰胜; 程骏; 庞建新
Original assignee: 深圳市优必选科技股份有限公司
Priority date: 2021-05-24
Filing date: 2021-10-21
Publication date: 2022-12-01
Also published as: CN113297423A

Abstract

Provided are a pushing method, a pushing apparatus, an electronic device, and a computer-readable storage medium. The method comprises: for each face ID, if face images belonging to the face ID are detected in a current video frame of a scene video, acquiring a quality score and a posture score of the face images (101); according to the quality score and the posture score of the face images, determining whether to update a candidate face image set of the face ID (102); determining that a face image from the candidate face image set is a target face image of the face ID (103); and pushing the target face image (104). By means of the solution, high-quality face images may be effectively found and pushed, and the subsequent processing efficiency and processing accuracy of the face images may be improved.

Description

A push method, push device and electronic equipment

This application claims priority to a Chinese patent application with application number 202110563530.7 filed at the China Patent Office on May 24, 2021, the entire contents of which are incorporated herein by reference.

technical field

The present application belongs to the technical field of image processing, and in particular relates to a pushing method, a pushing device, electronic equipment, and a computer-readable storage medium.

Background technique

At present, face features, as a common biological feature, have been applied in many scenarios, which requires electronic devices to collect video streams first, and then further process the face images contained in them based on the video streams, for example Perform accurate face recognition or face verification. However, in practical application scenarios, affected by interference factors such as the environment, video streams usually contain a certain amount of poor-quality face images, which affects subsequent efficient and accurate processing of face images.

technical problem

The present application provides a push method, a push device, an electronic device, and a computer-readable storage medium, which can effectively search and push high-quality face images, and improve the subsequent processing efficiency and accuracy of face images.

technical solution

In the first aspect, the present application provides a push method, including:

For each face ID, if the face image belonging to the above-mentioned face ID is detected in the current video frame of the scene video, the quality score and the attitude score of the above-mentioned face image are obtained;

According to the quality score and the attitude score of the above-mentioned face image, determine whether to update the candidate face image collection of the above-mentioned face ID;

Determine a face image as the target face image of the above-mentioned face ID from the above-mentioned candidate face image collection;

Push the target face image above.

In a second aspect, the present application provides a push device, including:

The score acquisition unit is used for each face ID, if the face image belonging to the above-mentioned face ID is detected in the current video frame of the scene video, then the quality score and the attitude score of the above-mentioned face image are obtained;

A set update unit is used to determine whether to update the candidate face image set of the above-mentioned face ID according to the quality score and the attitude score of the above-mentioned face image;

An image determination unit, configured to determine a face image from the set of candidate face images as the target face image of the face ID;

An image pushing unit, configured to push the aforementioned target face image.

In a third aspect, the present application provides an electronic device. The electronic device includes a memory, a processor, and a computer program stored in the memory and that can run on the processor. When the processor executes the computer program, the above-mentioned The steps of the method of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of the method in the first aspect above are implemented.

In a fifth aspect, the present application provides a computer program product, the computer program product includes a computer program, and when the computer program is executed by one or more processors, the steps of the method in the first aspect above are implemented.

Beneficial effect

The beneficial effect of the present application compared with the prior art is: for each face ID, if a face image belonging to the above-mentioned face ID is detected in the current video frame of the scene video, the quality score of the above-mentioned face image is obtained and pose score, and according to the quality score and pose score of the above-mentioned face image, determine whether to update the candidate face image set of the above-mentioned face ID, and then determine a face image from the above-mentioned candidate face image set as the above-mentioned face The target face image of the ID, and finally push the above target face image. This application scheme will evaluate the face images belonging to the same face ID (that is, belonging to the same user) in the scene video. The evaluation process is related to the quality score and pose score of each face image, and judges whether to The candidate face image set is updated so that the face images in the candidate face image set are all face images with better quality and better pose. Finally, the electronic device will determine the target face image from the set of candidate face images to push, so that subsequent face image processing modules, such as face recognition modules or face verification modules, can be based on the pushed target face image Carry out further face image processing operations. It can be understood that, for the beneficial effects of the above-mentioned second aspect to the fifth aspect, reference can be made to the relevant description in the above-mentioned first aspect, and details will not be repeated here.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the accompanying drawings that need to be used in the descriptions of the embodiments or the prior art will be briefly introduced below. Obviously, the accompanying drawings in the following description are only for the present application For some embodiments, those of ordinary skill in the art can also obtain other drawings based on these drawings without any creative effort.

FIG. 1 is a schematic diagram of the implementation flow of the push method provided by the embodiment of the present application;

Fig. 2 is a schematic diagram of a coarse-to-fine network architecture for pose scores provided by the embodiment of the present application;

Fig. 3 is a structural block diagram of a push device provided by an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

Embodiments of the present invention

In the following description, specific details such as specific system structures and technologies are presented for the purpose of illustration rather than limitation, so as to thoroughly understand the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to illustrate the technical solution proposed by the present application, specific examples will be used below to illustrate.

The push method proposed in the embodiment of the present application is described below. Please refer to Figure 1, the push method includes:

Step 101, for each face ID, if a face image belonging to the above-mentioned face ID is detected in the current video frame of the scene video, the quality score and pose score of the above-mentioned face image are obtained.

In the embodiment of the present application, the electronic device can be integrated with a camera, and the scene video can be obtained by shooting a designated area through the camera; or, the electronic device can also be connected with other electronic devices with a camera, and the other electronic device can The device shoots the specified area through its camera, and transmits the captured scene video to the electronic device, which is not limited here.

After the electronic device obtains the scene video, it can start to perform face detection on video frames in the scene video. This face detection is different from face recognition or face verification, and only detects the face images that may be included in the video frame. Through face detection, a face frame containing face information and five key points of the face information can be obtained. Considering the size of the face frame and the user's possible side faces and other postures, after obtaining the face frame and key points, it is necessary to perform preprocessing operations to achieve the alignment of the face frame in order to obtain the final face image. The embodiment of the present application uses the method of similar transformation (SimilarTransform) to map these five key points to specified coordinate points respectively, for example:

(38.2946*0.5714, 51.6963*0.5714), corresponding to the key points of the left eye;

(73.5318*0.5714, 51.5014*0.5714), corresponding to the key points of the right eye;

(56.0252*0.5714,71.7366*0.5714), corresponding to the key points of the nose;

(41.5493*0.5714, 92.3655*0.5714), corresponding to the key point of the left mouth corner;

(70.7299*0.5714, 92.2041*0.5714), corresponding to the key point of the right mouth corner.

Thus, a face image can be obtained through face detection. Obviously, the detected face image must be the face image of a certain user (that is, a certain person); based on this, when a new face image is detected, a face ID can be assigned to the face image, It is used to indicate that the face image belongs to the user represented by the face ID. It can be considered that the face ID in the embodiment of the present application is used to distinguish different users who are in the same picture (that is, the same video frame), and to realize the tracking of each user's face image in the scene video. It can be understood that for any face ID, steps 101-104 can be used to push the target face image of the face ID. Therefore, the face IDs mentioned later in the embodiments of the present application that are not specifically described are all the same face ID, so as to facilitate the explanation and description of each step.

Considering that people cannot teleport, that is, people cannot suddenly appear and disappear suddenly in the scene video, so in the collected scene video, the face images that usually belong to the face ID will appear in consecutive multiple appear in the video frame, and these face images must be able to form a continuous trajectory. For electronic devices with high computing power, these electronic devices can perform real-time face detection on the current video frame at each moment; while for electronic devices with low computing power, these electronic devices can only periodically Perform face detection on the video frame, that is, perform face detection on the current video frame at intervals. As long as a face image belonging to the face ID is detected in the current video frame of the scene video, the electronic device can obtain the quality score and pose score of the face image. As an example only, the quality score and the pose score can be respectively obtained through a pre-trained neural network model. Of course, other methods may also be used to obtain the quality score and pose score of the face image, which are not limited here.

Step 102, determine whether to update the candidate face image set of the above-mentioned face ID according to the quality score and the pose score of the above-mentioned face image;

In the embodiment of the present application, each time the electronic device detects a new face image to create a face ID, it also simultaneously creates a set of candidate face images for the face ID. Obviously, the set of candidate face images is empty when it is first created. Follow-up can judge whether to update the candidate face image set according to the quality score and pose score of the face image belonging to the face ID detected each time, that is, whether the face image needs to be stored in the A collection of candidate face images. Specifically, the electronic device may pre-set a quality score condition for the quality score, and a gesture score condition for the gesture score, and use this to evaluate the face image. Only when the quality score of the face image satisfies the quality score condition and the pose score satisfies the pose score condition, the candidate face image set is updated, specifically, the face image is stored in the candidate face image set .

Step 103, determining a face image from the set of candidate face images as the target face image of the face ID.

In the embodiment of the present application, the

above steps

101 and 102 can be executed repeatedly; that is, as long as the specified push timing is not reached, step 101 will be executed again after each execution of step 102; once the specified push timing is reached , just enter step 103 immediately, determine a face image as the target face image of this face ID from the current candidate face image collection of this face ID. It should be noted that there may be more than one designated push timing. For example, a passive push timing can be set, and the passive push timing is similar to an interrupt operation, which is generally unpredictable; an active push timing can also be set, and the active push timing can be predicted generally, here The specified push timing is not limited.

In an application scenario, the set of candidate face images may be empty. That is to say, within a period of time before the specified push timing is reached, all detected face images of the face ID cannot be stored in the set of candidate face images. At this point, it can be considered that there is no high-quality face image of the user represented by the face ID in the scene video. In order to improve the face image of the user represented by the face ID, the electronic device may output a reminder message to remind the subject (that is, the user) to adjust his angle and/or position.

In another application scenario, the set of candidate face images may contain only one face image. That is to say, within this period of time before the specified push timing is reached, among all the detected face images of the face ID, only one face image is stored in the set of candidate face images. At this point, the electronic device has no other choice, and may directly determine the only face image in the set of candidate face images as the target face image.

In yet another application scenario, the set of candidate face images may contain more than two face images. At this time, there is a possibility of selection by the electronic device, which can further screen the candidate face images to find the best face image in the set of candidate face images as the target face image.

Step 104, pushing the aforementioned target face image.

In the embodiment of the present application, after the target face image is determined, the target face image can be pushed to other modules in the electronic device, such as a face verification module or a face recognition module, so that the other modules can The target face image, which is a high-quality face image, can be processed to a certain extent to avoid the occurrence of processing failures. Certainly, the target face image may also be pushed to other electronic devices for further processing, which is not limited here. After pushing the target face image, the candidate face image set of the face ID can be cleared, that is, all face images in the candidate face image set can be deleted, and return to step 101 and subsequent steps.

In some embodiments, in the above step 101, the quality score and the pose score of the face image can be obtained through a pre-trained neural network model, then this step 101 can be specifically expressed as:

A1. Input the above human face image into a preset first classification network to obtain the quality score of the above human face image, wherein the above first classification network is used to classify the image quality of the above human face image.

The following is a brief introduction to the first classification network:

In the embodiment of the present application, a lightweight convolutional neural network (Convolutional Neural Networks, CNN), such as ShuffleNetV2, is used to construct the first classification network. Considering that for electronic devices with low computing power, the processing speed of ShuffleNetV2 cannot meet the requirement of less than 50ms, so in the embodiment of this application, the ShuffleNetV2 can also be modified, and the number of channels of ShuffleNetV2 can be reduced to the original through channel cutting operation A quarter of the ShuffleNetV2×0.25 network is obtained as the first classification network used, and the ShuffleNetV2×0.25 is trained for three classifications. After the three-classification training is completed, the ShuffleNetV2×0.25 network can be put into application, and the result of three-classification of face images can be obtained. That is, the first classification network is essentially a three-classification network.

Wherein, the embodiment of the present application sets three categories for the first classification network, which are: 0, which means fuzzy; 1, which means relatively clear; 2, which means clear. By defining the three classifications in this way, it can help to classify the intermediate state more clearly, and further improve the accuracy of fuzzy and clear classification. Based on this, after obtaining the results of the three classifications of face images, the results of the three classifications can be further processed:

Assume that the given quality score y={0,60,100}, that is, 0 points for fuzzy, 60 points for clearness, and 100 points for clearness; and the results of the three classifications output by the first classification network are that the face images belong to these three categories respectively The probability of is recorded as o={o1,o2,o3}, then the final quality score Eq of the face image can be calculated based on the following formula:

For example, if a certain human face image passes through the first classification model, the probability that it is clear is 0.9, the probability that it is relatively clear is 0.05, and the probability that it is fuzzy is 0.05, then the quality score Eq= of this human face image can be calculated. 0.9*100+0.05*60+0.05*0=93. In fact, in the process of verifying the first classification network, it can be found that: for clear input images, the obtained quality scores are usually distributed around 100; for blurred input images, the obtained quality scores are usually will be distributed around 0; for a clearer input image, the resulting quality score will usually be distributed around 60, if the blur of the input image is clearer and more blurred, the score of the input image will be less than 60 , if the blur degree of the input image is clearer and tends to be clearer, the score of the input image will be greater than 60. Based on this, in the embodiment of the present application, 60 points can be used as the blur threshold to eliminate unclear face images, and facilitate further operations such as subsequent face recognition.

A2. Input the above-mentioned human face image into the preset second classification network to obtain the pose score of the above-mentioned human face image, wherein the above-mentioned second classification network includes three sub-classification networks, and the above-mentioned three sub-classification networks are respectively used for classification The pitch angle, yaw angle and roll angle of the face represented in the above face image are classified.

The following is a brief introduction to the second classification network:

In the embodiment of the present application, the involved attitude angles include three, which are pitch angle (pitch), yaw angle (yaw) and roll angle (roll). Considering these three attitude angles separately, each angle has an independent multi-classification task; that is, the three attitude angles are considered as three separate multi-classification tasks. Considering that the calculation process of the attitude score under each attitude angle is the same, therefore, the following uses an attitude angle as an example to illustrate: the angle range predicted by the embodiment of the present application is [-99°, 99°], then it can be Every 3° is divided into one category to distinguish, then each attitude angle contains 66 categories. For example, under each attitude angle, [-99°, -96°) is regarded as a class, [-96°, -93°) is regarded as a class, and so on, each attitude angle can be divided into 66 categories.

For the classification result output by the second classification network, it can be understood that the pose angle of a certain face image roughly enters a certain angle range below it, and the resulting error is about 3°. In order to refine the classification results, the embodiment of the present application adopts the idea of Deep Expectation (DEX). DEX originated from age estimation, and the embodiment of this application migrates it to pose estimation. Assuming that the defined angle is divided into y, y is essentially a 66-dimensional label vector, y={-99,-96,...,-3,3,...,96,99}. The output of the second classification network is the probability that the attitude angles of the face image belong to these 66 categories, which are recorded as o={o ₁ ,o ₂ ,o ₃ ......,o ₆₄ ,o ₆₅ ,o ₆₆ }, Then the attitude score Ep of the face image at each attitude angle can be calculated based on the following formula:

The attitude score Ep actually represents the angle of the attitude angle, which is still a coarse-grained quantization result and reflects relatively coarse-grained attitude information. In this regard, the embodiment of the present application can introduce another regression task to refine the coarse-grained task, that is, the angle of the coarse-grained estimated attitude angle (that is, the predicted attitude score) and the angle of the label's attitude angle (that is, tag pose score) for regressive refinement learning. Specifically, the network architecture design from coarse to fine is shown in FIG. 2 . As can be seen from Figure 2, the loss function of the coarse-to-fine network can be denoted as L, and its calculation formula is:

Among them, cls is a rough estimate of Cross Entropy (Cross Entropy) in Figure 2, and MSE is a fine estimate of Mean Square Error (MSE) in Figure 2.

In order to make it possible to deploy electronic devices with low computing power (such as robots), the embodiment of the present application adopts a lightweight backbone network (Backbone) for network design. For example, the backbone network may be MobileNetV3_small. Considering that for electronic devices with low computing power, the processing speed of MobileNetV3_small cannot meet the requirement of less than 50ms, so in the embodiment of this application, the MobileNetV3_small can also be modified, and the number of channels of MobileNetV3_small can be reduced to the original by channel cutting operation A quarter of the MobileNetV3_small×0.25 network is obtained as the backbone network used.

In some embodiments, it can be judged whether the face image can be stored in the candidate face image set through the preset quality score condition and the preset pose score condition, then the above step 102 can specifically include:

B1. Detecting whether the quality score of the face image satisfies the quality score condition, and detecting whether the pose score of the face image satisfies the pose score condition.

Among them, the quality score condition is used to detect whether the image quality of the face image meets the requirements, that is, whether the face image is clear enough; the attitude score condition is used to detect whether the face pose of the face image meets the requirements, that is, the person Is the facial posture upright enough? It should be noted that the two tests in step B1 can be performed sequentially or simultaneously, which is not limited here. When running concurrently, if either of these two tests fails, the other test can be terminated directly.

As an example only, considering that there are actually three attitude scores of a face image (that is, pitch angle, yaw angle, and roll angle each have a corresponding attitude score), therefore, the detection process of the attitude score is relative to the detection process of the quality score. more complicated. Based on this, for electronic devices with low computing power, it is possible to first detect whether the quality score of the face image satisfies the quality score condition, and then detect whether the pose score of the face image satisfies the pose score condition.

Specifically, the quality score condition may be: the quality score is not lower than a preset quality score threshold. Based on the description of step A1, the quality score threshold may be 60; that is, the quality score condition may be expressed as: Eq≥60.

Specifically, the attitude score condition may be: the absolute value of the attitude score of the pitch angle is less than the preset first attitude score threshold, and the absolute value of the attitude score of the yaw angle is less than the preset second attitude score threshold, and the pitch angle , the sum of the absolute values of the attitude scores of the yaw angle and the roll angle is less than the preset third attitude score threshold, wherein the first attitude score threshold is less than the second attitude score threshold, and the third attitude score threshold is the first attitude score The sum of the threshold and the second pose score threshold. For example, the first attitude score threshold can be 25, the second attitude score threshold can be 40, and the third attitude score threshold can be 65; the attitude score of the pitch angle is Ep(pitch), and the attitude score of the yaw angle is Ep( yaw), the attitude score of the roll angle is Ep(roll), then the attitude score condition can be expressed as:

|Ep(pitch)|<25&&|Ep(yaw)|<40&&|Ep(pitch)|+|Ep(yaw)|+|Ep(roll)|<65

B2. If the quality score satisfies the quality score condition and the pose score satisfies the pose score condition, store the face image into the candidate face image set.

If the quality score of a certain face image satisfies the quality score condition, and its pose score satisfies the pose score condition, then the candidate face image set of the face ID to which the face image belongs can be updated at present, specifically the The face images are stored in the set of candidate face images.

In some embodiments, if there are more than two human face images in the candidate face image collection, further screening is required to obtain the target human face image, then the above step 103 may specifically include:

C1. Calculate the matching score of each face image according to the quality score and pose score of each face image in the candidate face image set.

Wherein, the calculation formula of the matching score match_score is:

The design idea of this calculation formula is: pay attention to the quality of the face and the pose of the face at the same time. Since the face pose is the smaller the better, it is designed as a negative correlation; in step B1, the number of face images stored in the candidate face image set has been limited. The sum of the absolute values of the attitude scores of the three attitude angles of the face image does not exceed 65°, so the design of the above formula is made. In addition, for the three attitude angles, the embodiment of the present application hopes to pay more attention to the pitch angle and the yaw angle, so the absolute value of the attitude score of the roll angle is divided to reduce the weight of the roll angle.

C2. In the set of candidate face images, determine the face image with the highest matching score as the target face image of the face ID.

After the matching score is calculated for each face image in the candidate face image set by the formula shown in C1, the face image with the highest matching score in the candidate face image set can be obtained by screening. The face image can be determined as the target face image of the face ID.

In some embodiments, the active push timing can be: when the preset time interval has passed since the last push; the active push timing can also be understood as when the scene video has been preset since the last push times of face detection. That is, the target face image is periodically found out from the set of candidate face images of the face ID and pushed. For electronic devices with low computing power, it is assumed that the time it takes to execute step 101 and step 102 once is fixed, for example, 200 milliseconds (ms); it is assumed that the initial time starts from 0; it is assumed that the preset time interval is 2s, Then you can imagine the following scenario:

At the initial moment, the human face image 1 belonging to the human face ID1 is detected; by

steps

101 and 102, it is determined that the human face image 1 cannot be stored in the candidate face image collection of the human face ID1, and the 200ms moment has been reached at this moment;

At 200ms, the face image 2 belonging to the face ID1 is detected; through

steps

101 and 102, it is determined that the face image 2 can be stored in the candidate face image collection of the face ID1, and the 400ms moment has been reached at this moment;

By analogy, until the moment of 2s, it can be seen that the current push timing is satisfactory. Apparently, step 101 and step 102 have been executed 10 times in total. Assuming that the candidate face image set stores three face images 2, 5 and 9, the target face image is selected from the three face images and pushed.

At the same time, the electronic device will also use the 2s moment as the new initial moment, and start a new round of update of the candidate face image set after clearing the candidate face image set. The process is similar to the previous one, and will not be repeated here. .

In some embodiments, for electronic devices with low computing power, a trace parameter trace_num can be introduced to judge whether the active push opportunity is reached, wherein the trace_num is initialized to 0. Its process is specifically: for a people's face ID, when detecting the people's face image that belongs to this people's face ID in current video frame at every turn, just add 1 to the numerical value of this trace_num, and carry out

steps

101 and 102; In step 102 Finally, use the formula whether trace_num%update_num is 0 to judge whether the timing of the active push is satisfied: if trace_num%update_num is 0, the timing of the active push has been reached; otherwise, if trace_num%update_num is not 0, the timing of the active push has not been reached Opportunity, return to the current video frame in the detection scene video, and when it is detected that there is a face image belonging to the face ID, update trace_num, and perform

subsequent steps

101 and 102, which will not be repeated here. Wherein, the value of update_num is determined according to the total time spent in step 101 and step 102 (that is, the total time required to execute step 101 and step 102 ) and a preset time interval. Denote the preset time interval as T, denote the total time spent in step 101 and step 102 as t, and the update_num is the ratio of T to t. For example, if T is 2s and t is 200ms, update_num is 10. for example:

At t0 moment, the face image 1 belonging to the face ID1 is detected for the first time, and the trace_num is updated to 1; through

steps

101 and 102, it is determined that the face image 1 cannot be stored in the candidate face image set of the face ID1, and trace_num%update_num is not 0, and the timing of active push has not been reached. This is time t1, and detects whether there is a face image belonging to face ID1 at time t1;

At t1 moment, detect the face image 2 belonging to face ID1 for the second time, and trace_num is updated to 2; through

steps

101 and 102, it is determined that this face image 2 can be stored in the candidate face image set of face ID1, and trace_num%update_num is not 0, and the active push timing has not been reached, and it is time t2, and detects whether there is a face image belonging to face ID1 at time t2;

By analogy, until trace_num%update_num is 0, it is confirmed that the current active push timing has been met. At this point, the target face image can be found out from the candidate face image collection and pushed. It should be noted that after each push, there is no need to clear trace_num; that is, trace_num can be continuously accumulated. Moreover, each face ID corresponds to its own trace_num. If the face image is lost for a certain face ID, the trace_num corresponding to the face ID can be deleted to release the memory.

In some embodiments, the timing of passive push can be: when the face image belonging to the face ID is lost in the scene video; that is, when the face image belonging to the face ID can no longer be detected in the scene video When there is no face image, it is considered lost. This is generally caused by the fact that the user represented by the face ID has walked out of the camera's shooting field of view, causing the camera to no longer be able to capture the user. The passive push timing is actually to make up for the deficiency of the active push timing. The following scenario can be imagined: assuming that the time interval set in the active push timing is 2s, that is, the target face image is searched and pushed every 2s. Assume that a user walks through the shooting area of the camera very quickly, causing the camera to capture the user in less than 2 seconds, so that there are very few video frames of the user's face image in the scene video. Due to the insufficient duration of the user's face image in the scene video, the user's face image will not be pushed based on the timing of active push. This process is obviously unreasonable. Based on this, a passive push timing is proposed, so that as long as the face image of a certain face ID (representing a certain user) is lost in the scene video, the target face image of the face ID is immediately searched and pushed to avoid Occurrence of missed detection. For example, if a user quickly walks through the shooting area of the camera, and the user corresponds to face ID1, then:

steps

101 and 102, it is determined that the face image 1 cannot be stored in the candidate face image set of the face ID1, and trace_num%update_num is not 0, and the timing of active push has not been reached. At this time, the time reaches time t1, and detects whether there is a face image belonging to face ID1 at time t1;

steps

101 and 102, it is determined that this face image 1 can be stored in the candidate face image set of face ID1, and trace_num%update_num is not 0, and the timing of active push has not been reached. At this time, the time reaches time t2, and detects whether there is a face image belonging to face ID1 at time t2;

At time t2, the face image belonging to face ID1 cannot be detected; that is, the face image belonging to this face ID is lost. At this time, trace_num is still 2 and will not increase any more, which will cause trace_num% update_num can no longer be 0; in the scenario where only the active push timing is set, the target face image of the face ID cannot be pushed; and in the scenario where the passive push timing is added, it can be based on the Passive push timing, search and push the target face image in the candidate face image collection of the face ID.

It should be noted that after the passive push timing is met and

steps

103 and 104 are executed, the electronic device can clear the data related to the face ID, such as deleting the face ID candidate face image collection, etc., to release resource. In some embodiments, the concept of tracker can also be introduced. That is, the face image belonging to the new face ID appears in the detected video frame (the face image of the new face ID refers to the face image that has not appeared in several consecutive frames of video frames before the video frame. Face ID face image), create a corresponding tracker for the face image. The trace_num and candidate face image sets updated based on the face ID are bound to the tracker. When the face ID is lost, search and push the target face image of the face ID based on the passive push timing, and then delete the tracker corresponding to the face ID, including deleting the trace_num and candidate bound to the tracker A collection of face images.

As can be seen from the above, through the embodiment of the present application, the face images belonging to the same face ID (that is, belonging to the same user) in the scene video can be evaluated, and the evaluation process is related to the quality score and the pose score of each face image. And judge whether to update the set of candidate face images based on the evaluation result, so that the face images in the set of candidate face images are all face images with better quality and better pose. Finally, the electronic device will determine the target face image from the set of candidate face images to push, so that subsequent face image processing modules, such as face recognition modules or face verification modules, can be based on the pushed target face image Carry out further face image processing operations.

Corresponding to the push method provided above, an embodiment of the present application further provides a push device. As shown in Figure 3, the pushing device 300 includes:

Score acquisition unit 301, for each face ID, if the face image belonging to the above-mentioned face ID is detected in the current video frame of the scene video, then obtain the quality score and the attitude score of the above-mentioned face image;

Set update unit 302, for determining whether to update the candidate face image set of the above-mentioned face ID according to the quality score and the attitude score of the above-mentioned face image;

An image determining unit 303, configured to determine a face image from the set of candidate face images as the target face image of the face ID;

The image pushing unit 304 is configured to push the aforementioned target face image.

Optionally, the score acquisition unit 301 includes:

The quality score acquisition subunit is used to input the above-mentioned human face image into the preset first classification network to obtain the quality score of the above-mentioned human face image, wherein the above-mentioned first classification network is used for image classification of the above-mentioned human face image Classify by quality;

The posture score acquisition subunit is used to input the above-mentioned human face image into the preset second classification network to obtain the posture score of the above-mentioned human face image, wherein the above-mentioned second classification network includes three sub-classification networks, the three sub-classification networks The classification network is respectively used to classify the pitch angle, yaw angle and roll angle of the face represented in the above face image.

Optionally, the above set updating unit 302 includes:

The quality detection subunit is used to detect whether the quality score of the above-mentioned face image satisfies the preset quality score condition;

A pose detection subunit, used to detect whether the pose score of the above-mentioned face image meets the preset pose score condition;

The set update subunit is configured to store the face image into the set of candidate face images if the quality score satisfies the quality score condition and the pose score satisfies the pose score condition.

Optionally, the above-mentioned image determining unit 303 includes:

The matching score calculation subunit is used to calculate the matching score of each face image according to the quality score and the attitude score of each face image in the above-mentioned candidate face image set;

The target face image determination subunit is configured to determine the face image with the highest matching score as the target face image of the face ID in the set of candidate face images.

Optionally, the image determining unit 303 is specifically configured to determine a face image from the set of candidate face images as a target face image of the face ID according to a preset interval.

Optionally, the above-mentioned image determining unit 303 is specifically configured to determine a face image from the above-mentioned candidate face image set as the above-mentioned face when the face image belonging to the above-mentioned face ID is lost in the above-mentioned scene video ID of the target face image.

Optionally, the above-mentioned pushing device 300 also includes:

A set clearing unit, configured to clear the set of candidate face images after the target face image is pushed by the image pushing unit.

Corresponding to the push method provided above, an embodiment of the present application further provides an electronic device. Referring to Fig. 4, the electronic device 4 in the embodiment of the present application includes: a memory 401, one or more processors 402 (only one is shown in Fig. 4 ) and a computer stored on the memory 401 and operable on the processor program. Wherein: the memory 401 is used to store software programs and units, and the processor 402 executes various functional applications and diagnoses by running the software programs and units stored in the memory 401 to obtain resources corresponding to the above preset events. Specifically, the processor 402 implements the following steps by running the above-mentioned computer program stored in the memory 401:

Push the target face image above.

Assuming that the above is the first possible implementation manner, then in the second possible implementation manner provided on the basis of the first possible implementation manner, the acquisition of the quality score and pose score of the above-mentioned face image includes:

The above-mentioned human face image is input into a preset first classification network to obtain the quality score of the above-mentioned human face image, wherein the above-mentioned first classification network is used to classify the image quality of the above-mentioned human face image;

Input the above-mentioned human face image into the preset second classification network to obtain the pose score of the above-mentioned human face image, wherein the above-mentioned second classification network includes three sub-classification networks, and the above-mentioned three sub-classification networks are respectively used to classify the above-mentioned human Classify the pitch angle, yaw angle and roll angle of the face represented in the face image.

In the third possible implementation manner provided on the basis of the above first possible implementation manner, it is determined whether to update the candidate face image set of the above-mentioned face ID according to the quality score and the pose score of the above-mentioned face image, include:

Detecting whether the quality score of the above-mentioned face image satisfies a preset quality score condition, and detecting whether the pose score of the above-mentioned face image satisfies the preset pose score condition;

If the above quality score satisfies the above quality score condition, and the above pose score satisfies the above pose score condition, then store the above human face image into the above candidate face image set.

In the fourth possible implementation manner provided on the basis of the above first possible implementation manner, the determination of a face image from the above-mentioned candidate face image set as the target face image of the above-mentioned face ID includes: :

According to the quality score and the attitude score of each face image in the above-mentioned candidate face image collection, calculate the matching score of each face image;

In the set of candidate face images, the face image with the highest matching score is determined as the target face image of the face ID.

In the fifth possible implementation manner provided on the basis of the above first possible implementation manner, the determination of a face image from the above-mentioned candidate face image set as the target face image of the above-mentioned face ID includes: :

According to a preset interval, a face image is determined from the set of candidate face images as a target face image of the face ID.

In the sixth possible implementation manner provided on the basis of the first possible implementation manner above, the determination of a face image from the set of candidate face images as the target face image of the face ID includes: :

When the face image belonging to the above-mentioned face ID is lost in the above-mentioned scene video, a face image is determined from the above-mentioned candidate face image set as the target face image of the above-mentioned face ID.

Based on the above-mentioned first possible implementation manner, or on the basis of the above-mentioned second possible implementation manner, or on the basis of the above-mentioned third possible implementation manner, or on the basis of the above-mentioned fourth possible implementation manner, or on the basis of the above-mentioned In the fifth possible implementation manner as a basis, or in the seventh possible implementation manner provided on the basis of the above sixth possible implementation manner, after the above-mentioned target face image is pushed, the processor 402 runs the The above-mentioned computer program of memory 401 also realizes the following steps:

Empty the above collection of candidate face images.

It should be understood that in the embodiment of the present application, the so-called processor 402 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP) , Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.

The memory 401 may include read-only memory and random-access memory, and provides instructions and data to the processor 402 . Part or all of the memory 401 may also include non-volatile random access memory. For example, the memory 401 may also store information of device categories.

Those skilled in the art can clearly understand that for the convenience and brevity of description, only the division of the above-mentioned functional units and modules is used for illustration. In practical applications, the above-mentioned functions can be assigned to different functional units, Module completion means that the internal structure of the above-mentioned device is divided into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment can be integrated into one processing unit, or each unit can exist separately physically, or two or more units can be integrated into one unit, and the above-mentioned integrated units can either adopt hardware It can also be implemented in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the above system, reference may be made to the corresponding process in the foregoing method embodiments, and details will not be repeated here.

In the above-mentioned embodiments, the descriptions of each embodiment have their own emphases, and for parts that are not detailed or recorded in a certain embodiment, refer to the relevant descriptions of other embodiments.

Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of external device software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

In the embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the above-described system embodiments are only illustrative. For example, the division of the above-mentioned modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined Or it can be integrated into another system, or some features can be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

If the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the present application realizes all or part of the processes in the methods of the above-mentioned embodiments, and can also be completed by instructing associated hardware through computer programs. The above-mentioned computer programs can be stored in a computer-readable storage medium, and the computer When the program is executed by the processor, the steps in the above-mentioned various method embodiments can be realized. Wherein, the above-mentioned computer program includes computer program code, and the above-mentioned computer program code may be in the form of source code, object code, executable file or some intermediate form. The above-mentioned computer-readable storage medium may include: any entity or device capable of carrying the above-mentioned computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer-readable memory, read-only memory (ROM, Read-Only Memory ), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal, and software distribution medium, etc. It should be noted that the content contained in the above-mentioned computer-readable storage media can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, computer-readable storage media The medium does not include electrical carrier signals and telecommunication signals.

The above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still apply to the foregoing embodiments Modifications to the technical solutions recorded, or equivalent replacements for some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of each embodiment of the application, and should be included in this application. within the scope of protection.

Claims

A push method, characterized in that, comprising:

For each face ID, if a face image belonging to the face ID is detected in the current video frame of the scene video, the quality score and the attitude score of the face image are obtained;

According to the quality score and the posture score of the human face image, determine whether to update the candidate face image set of the human face ID;

Determine a face image as the target face image of the face ID from the set of candidate face images;

Push the target face image.
The pushing method according to claim 1, wherein said obtaining the quality score and the pose score of said face image comprises:

The human face image is input into a preset first classification network to obtain the quality score of the human face image, wherein the first classification network is used to classify the image quality of the human face image;

The human face image is input into a preset second classification network to obtain the pose score of the human face image, wherein the second classification network includes three sub-classification networks, and the three sub-classification networks are respectively used Classify the pitch angle, yaw angle and roll angle of the face represented in the face image.
The pushing method according to claim 1, wherein, determining whether to update the candidate face image set of the face ID according to the quality score and the pose score of the face image comprises:

Detecting whether the quality score of the face image satisfies a preset quality score condition, and detecting whether the pose score of the face image satisfies a preset pose score condition;

If the quality score satisfies the quality score condition and the pose score satisfies the pose score condition, then store the face image into the set of candidate face images.
The push method according to claim 1, wherein said determining a face image as the target face image of said face ID from said candidate face image set comprises:

Calculate the matching score of each face image according to the quality score and the attitude score of each face image in the set of candidate face images;

In the set of candidate face images, the face image with the highest matching score is determined as the target face image of the face ID.
The push method according to claim 1, wherein said determining a face image as the target face image of said face ID from said candidate face image set comprises:

According to a preset interval, a face image is determined from the set of candidate face images as a target face image of the face ID.
The push method according to claim 1, wherein said determining a face image as the target face image of said face ID from said candidate face image set comprises:

When a face image belonging to the face ID is lost in the scene video, a face image is determined from the set of candidate face images as a target face image of the face ID.
The push method according to any one of claims 1 to 6, wherein, after the pushing of the target face image, the push method further comprises:

Empty the set of candidate face images.
A push device is characterized in that it comprises:

A score acquisition unit, for each face ID, if a face image belonging to the face ID is detected in the current video frame of the scene video, then obtain the quality score and the pose score of the face image;

A set update unit, used to determine whether to update the set of candidate face images of the face ID according to the quality score and the pose score of the face image;

An image determination unit, configured to determine a face image from the set of candidate face images as the target face image of the face ID;

An image pushing unit, configured to push the target face image.
An electronic device, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, characterized in that, when the processor executes the computer program, the computer program according to claims 1 to 1 is implemented. 7. The method described in any one.
A computer-readable storage medium storing a computer program, wherein the computer program implements the method according to any one of claims 1 to 7 when executed by a processor.