WO2022247118A1 - Pushing method, pushing apparatus and electronic device - Google Patents

Pushing method, pushing apparatus and electronic device Download PDF

Info

Publication number
WO2022247118A1
WO2022247118A1 PCT/CN2021/125407 CN2021125407W WO2022247118A1 WO 2022247118 A1 WO2022247118 A1 WO 2022247118A1 CN 2021125407 W CN2021125407 W CN 2021125407W WO 2022247118 A1 WO2022247118 A1 WO 2022247118A1
Authority
WO
WIPO (PCT)
Prior art keywords
face image
face
score
candidate
image
Prior art date
Application number
PCT/CN2021/125407
Other languages
French (fr)
Chinese (zh)
Inventor
曾钰胜
程骏
庞建新
Original Assignee
深圳市优必选科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市优必选科技股份有限公司 filed Critical 深圳市优必选科技股份有限公司
Publication of WO2022247118A1 publication Critical patent/WO2022247118A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people

Definitions

  • the present application belongs to the technical field of image processing, and in particular relates to a pushing method, a pushing device, electronic equipment, and a computer-readable storage medium.
  • face features as a common biological feature, have been applied in many scenarios, which requires electronic devices to collect video streams first, and then further process the face images contained in them based on the video streams, for example Perform accurate face recognition or face verification.
  • video streams affected by interference factors such as the environment, video streams usually contain a certain amount of poor-quality face images, which affects subsequent efficient and accurate processing of face images.
  • the present application provides a push method, a push device, an electronic device, and a computer-readable storage medium, which can effectively search and push high-quality face images, and improve the subsequent processing efficiency and accuracy of face images.
  • the present application provides a push method, including:
  • the quality score and the attitude score of the above-mentioned face image determine whether to update the candidate face image collection of the above-mentioned face ID
  • a push device including:
  • the score acquisition unit is used for each face ID, if the face image belonging to the above-mentioned face ID is detected in the current video frame of the scene video, then the quality score and the attitude score of the above-mentioned face image are obtained;
  • a set update unit is used to determine whether to update the candidate face image set of the above-mentioned face ID according to the quality score and the attitude score of the above-mentioned face image;
  • An image determination unit configured to determine a face image from the set of candidate face images as the target face image of the face ID
  • An image pushing unit configured to push the aforementioned target face image.
  • the present application provides an electronic device.
  • the electronic device includes a memory, a processor, and a computer program stored in the memory and that can run on the processor.
  • the processor executes the computer program, the above-mentioned The steps of the method of the first aspect.
  • the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of the method in the first aspect above are implemented.
  • the present application provides a computer program product, the computer program product includes a computer program, and when the computer program is executed by one or more processors, the steps of the method in the first aspect above are implemented.
  • the beneficial effect of the present application compared with the prior art is: for each face ID, if a face image belonging to the above-mentioned face ID is detected in the current video frame of the scene video, the quality score of the above-mentioned face image is obtained and pose score, and according to the quality score and pose score of the above-mentioned face image, determine whether to update the candidate face image set of the above-mentioned face ID, and then determine a face image from the above-mentioned candidate face image set as the above-mentioned face The target face image of the ID, and finally push the above target face image.
  • This application scheme will evaluate the face images belonging to the same face ID (that is, belonging to the same user) in the scene video.
  • the evaluation process is related to the quality score and pose score of each face image, and judges whether to The candidate face image set is updated so that the face images in the candidate face image set are all face images with better quality and better pose.
  • the electronic device will determine the target face image from the set of candidate face images to push, so that subsequent face image processing modules, such as face recognition modules or face verification modules, can be based on the pushed target face image Carry out further face image processing operations.
  • FIG. 1 is a schematic diagram of the implementation flow of the push method provided by the embodiment of the present application.
  • Fig. 2 is a schematic diagram of a coarse-to-fine network architecture for pose scores provided by the embodiment of the present application
  • Fig. 3 is a structural block diagram of a push device provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the push method includes:
  • Step 101 for each face ID, if a face image belonging to the above-mentioned face ID is detected in the current video frame of the scene video, the quality score and pose score of the above-mentioned face image are obtained.
  • the electronic device can be integrated with a camera, and the scene video can be obtained by shooting a designated area through the camera; or, the electronic device can also be connected with other electronic devices with a camera, and the other electronic device can The device shoots the specified area through its camera, and transmits the captured scene video to the electronic device, which is not limited here.
  • the electronic device After the electronic device obtains the scene video, it can start to perform face detection on video frames in the scene video. This face detection is different from face recognition or face verification, and only detects the face images that may be included in the video frame. Through face detection, a face frame containing face information and five key points of the face information can be obtained. Considering the size of the face frame and the user's possible side faces and other postures, after obtaining the face frame and key points, it is necessary to perform preprocessing operations to achieve the alignment of the face frame in order to obtain the final face image.
  • the embodiment of the present application uses the method of similar transformation (SimilarTransform) to map these five key points to specified coordinate points respectively, for example:
  • a face image can be obtained through face detection.
  • the detected face image must be the face image of a certain user (that is, a certain person); based on this, when a new face image is detected, a face ID can be assigned to the face image, It is used to indicate that the face image belongs to the user represented by the face ID.
  • the face ID in the embodiment of the present application is used to distinguish different users who are in the same picture (that is, the same video frame), and to realize the tracking of each user's face image in the scene video.
  • steps 101-104 can be used to push the target face image of the face ID. Therefore, the face IDs mentioned later in the embodiments of the present application that are not specifically described are all the same face ID, so as to facilitate the explanation and description of each step.
  • the face images that usually belong to the face ID will appear in consecutive multiple appear in the video frame, and these face images must be able to form a continuous trajectory.
  • these electronic devices can perform real-time face detection on the current video frame at each moment; while for electronic devices with low computing power, these electronic devices can only periodically Perform face detection on the video frame, that is, perform face detection on the current video frame at intervals.
  • the electronic device can obtain the quality score and pose score of the face image.
  • the quality score and the pose score can be respectively obtained through a pre-trained neural network model. Of course, other methods may also be used to obtain the quality score and pose score of the face image, which are not limited here.
  • Step 102 determine whether to update the candidate face image set of the above-mentioned face ID according to the quality score and the pose score of the above-mentioned face image;
  • each time the electronic device detects a new face image to create a face ID it also simultaneously creates a set of candidate face images for the face ID.
  • the set of candidate face images is empty when it is first created.
  • the electronic device may pre-set a quality score condition for the quality score, and a gesture score condition for the gesture score, and use this to evaluate the face image. Only when the quality score of the face image satisfies the quality score condition and the pose score satisfies the pose score condition, the candidate face image set is updated, specifically, the face image is stored in the candidate face image set .
  • Step 103 determining a face image from the set of candidate face images as the target face image of the face ID.
  • the above steps 101 and 102 can be executed repeatedly; that is, as long as the specified push timing is not reached, step 101 will be executed again after each execution of step 102; once the specified push timing is reached , just enter step 103 immediately, determine a face image as the target face image of this face ID from the current candidate face image collection of this face ID.
  • a passive push timing can be set, and the passive push timing is similar to an interrupt operation, which is generally unpredictable; an active push timing can also be set, and the active push timing can be predicted generally, here
  • the specified push timing is not limited.
  • the set of candidate face images may be empty. That is to say, within a period of time before the specified push timing is reached, all detected face images of the face ID cannot be stored in the set of candidate face images. At this point, it can be considered that there is no high-quality face image of the user represented by the face ID in the scene video.
  • the electronic device may output a reminder message to remind the subject (that is, the user) to adjust his angle and/or position.
  • the set of candidate face images may contain only one face image. That is to say, within this period of time before the specified push timing is reached, among all the detected face images of the face ID, only one face image is stored in the set of candidate face images.
  • the electronic device has no other choice, and may directly determine the only face image in the set of candidate face images as the target face image.
  • the set of candidate face images may contain more than two face images.
  • the electronic device which can further screen the candidate face images to find the best face image in the set of candidate face images as the target face image.
  • Step 104 pushing the aforementioned target face image.
  • the target face image after the target face image is determined, the target face image can be pushed to other modules in the electronic device, such as a face verification module or a face recognition module, so that the other modules can
  • the target face image which is a high-quality face image, can be processed to a certain extent to avoid the occurrence of processing failures.
  • the target face image may also be pushed to other electronic devices for further processing, which is not limited here.
  • the candidate face image set of the face ID After pushing the target face image, the candidate face image set of the face ID can be cleared, that is, all face images in the candidate face image set can be deleted, and return to step 101 and subsequent steps.
  • the quality score and the pose score of the face image can be obtained through a pre-trained neural network model, then this step 101 can be specifically expressed as:
  • A1 Input the above human face image into a preset first classification network to obtain the quality score of the above human face image, wherein the above first classification network is used to classify the image quality of the above human face image.
  • a lightweight convolutional neural network (Convolutional Neural Networks, CNN), such as ShuffleNetV2, is used to construct the first classification network.
  • CNN Convolutional Neural Networks
  • the ShuffleNetV2 can also be modified, and the number of channels of ShuffleNetV2 can be reduced to the original through channel cutting operation
  • a quarter of the ShuffleNetV2 ⁇ 0.25 network is obtained as the first classification network used, and the ShuffleNetV2 ⁇ 0.25 is trained for three classifications. After the three-classification training is completed, the ShuffleNetV2 ⁇ 0.25 network can be put into application, and the result of three-classification of face images can be obtained. That is, the first classification network is essentially a three-classification network.
  • the embodiment of the present application sets three categories for the first classification network, which are: 0, which means fuzzy; 1, which means relatively clear; 2, which means clear.
  • 0, which means fuzzy a classification of the first classification network
  • 1, which means relatively clear a classification of the intermediate state
  • 2, which means clear a classification of the intermediate state
  • the results of the three classifications can be further processed:
  • the final quality score Eq of the face image can be calculated based on the following formula:
  • the obtained quality scores are usually distributed around 100; for blurred input images, the obtained quality scores are usually will be distributed around 0; for a clearer input image, the resulting quality score will usually be distributed around 60, if the blur of the input image is clearer and more blurred, the score of the input image will be less than 60 , if the blur degree of the input image is clearer and tends to be clearer, the score of the input image will be greater than 60.
  • 60 points can be used as the blur threshold to eliminate unclear face images, and facilitate further operations such as subsequent face recognition.
  • A2 Input the above-mentioned human face image into the preset second classification network to obtain the pose score of the above-mentioned human face image, wherein the above-mentioned second classification network includes three sub-classification networks, and the above-mentioned three sub-classification networks are respectively used for classification
  • the pitch angle, yaw angle and roll angle of the face represented in the above face image are classified.
  • the involved attitude angles include three, which are pitch angle (pitch), yaw angle (yaw) and roll angle (roll).
  • pitch angle pitch angle
  • yaw yaw
  • roll roll angle
  • each angle has an independent multi-classification task; that is, the three attitude angles are considered as three separate multi-classification tasks.
  • the calculation process of the attitude score under each attitude angle is the same, therefore, the following uses an attitude angle as an example to illustrate: the angle range predicted by the embodiment of the present application is [-99°, 99°], then it can be Every 3° is divided into one category to distinguish, then each attitude angle contains 66 categories. For example, under each attitude angle, [-99°, -96°) is regarded as a class, [-96°, -93°) is regarded as a class, and so on, each attitude angle can be divided into 66 categories.
  • the embodiment of the present application adopts the idea of Deep Expectation (DEX).
  • DEX originated from age estimation, and the embodiment of this application migrates it to pose estimation.
  • the attitude score Ep actually represents the angle of the attitude angle, which is still a coarse-grained quantization result and reflects relatively coarse-grained attitude information.
  • the embodiment of the present application can introduce another regression task to refine the coarse-grained task, that is, the angle of the coarse-grained estimated attitude angle (that is, the predicted attitude score) and the angle of the label's attitude angle (that is, tag pose score) for regressive refinement learning.
  • the network architecture design from coarse to fine is shown in FIG. 2 .
  • the loss function of the coarse-to-fine network can be denoted as L, and its calculation formula is:
  • cls is a rough estimate of Cross Entropy (Cross Entropy) in Figure 2
  • MSE is a fine estimate of Mean Square Error (MSE) in Figure 2.
  • the embodiment of the present application adopts a lightweight backbone network (Backbone) for network design.
  • the backbone network may be MobileNetV3_small.
  • the processing speed of MobileNetV3_small cannot meet the requirement of less than 50ms, so in the embodiment of this application, the MobileNetV3_small can also be modified, and the number of channels of MobileNetV3_small can be reduced to the original by channel cutting operation A quarter of the MobileNetV3_small ⁇ 0.25 network is obtained as the backbone network used.
  • the above step 102 can specifically include:
  • the quality score condition is used to detect whether the image quality of the face image meets the requirements, that is, whether the face image is clear enough;
  • the attitude score condition is used to detect whether the face pose of the face image meets the requirements, that is, the person Is the facial posture upright enough?
  • the two tests in step B1 can be performed sequentially or simultaneously, which is not limited here. When running concurrently, if either of these two tests fails, the other test can be terminated directly.
  • the detection process of the attitude score is relative to the detection process of the quality score. more complicated. Based on this, for electronic devices with low computing power, it is possible to first detect whether the quality score of the face image satisfies the quality score condition, and then detect whether the pose score of the face image satisfies the pose score condition.
  • the quality score condition may be: the quality score is not lower than a preset quality score threshold.
  • the quality score threshold may be 60; that is, the quality score condition may be expressed as: Eq ⁇ 60.
  • the attitude score condition may be: the absolute value of the attitude score of the pitch angle is less than the preset first attitude score threshold, and the absolute value of the attitude score of the yaw angle is less than the preset second attitude score threshold, and the pitch angle , the sum of the absolute values of the attitude scores of the yaw angle and the roll angle is less than the preset third attitude score threshold, wherein the first attitude score threshold is less than the second attitude score threshold, and the third attitude score threshold is the first attitude score The sum of the threshold and the second pose score threshold.
  • the first attitude score threshold can be 25, the second attitude score threshold can be 40, and the third attitude score threshold can be 65; the attitude score of the pitch angle is Ep(pitch), and the attitude score of the yaw angle is Ep( yaw), the attitude score of the roll angle is Ep(roll), then the attitude score condition can be expressed as:
  • the candidate face image set of the face ID to which the face image belongs can be updated at present, specifically the The face images are stored in the set of candidate face images.
  • the above step 103 may specifically include:
  • C1. Calculate the matching score of each face image according to the quality score and pose score of each face image in the candidate face image set.
  • step B1 the number of face images stored in the candidate face image set has been limited.
  • the sum of the absolute values of the attitude scores of the three attitude angles of the face image does not exceed 65°, so the design of the above formula is made.
  • the embodiment of the present application hopes to pay more attention to the pitch angle and the yaw angle, so the absolute value of the attitude score of the roll angle is divided to reduce the weight of the roll angle.
  • the face image with the highest matching score in the candidate face image set can be obtained by screening.
  • the face image can be determined as the target face image of the face ID.
  • the active push timing can be: when the preset time interval has passed since the last push; the active push timing can also be understood as when the scene video has been preset since the last push times of face detection. That is, the target face image is periodically found out from the set of candidate face images of the face ID and pushed.
  • the time it takes to execute step 101 and step 102 once is fixed, for example, 200 milliseconds (ms); it is assumed that the initial time starts from 0; it is assumed that the preset time interval is 2s, Then you can imagine the following scenario:
  • the human face image 1 belonging to the human face ID1 is detected; by steps 101 and 102, it is determined that the human face image 1 cannot be stored in the candidate face image collection of the human face ID1, and the 200ms moment has been reached at this moment;
  • the face image 2 belonging to the face ID1 is detected; through steps 101 and 102, it is determined that the face image 2 can be stored in the candidate face image collection of the face ID1, and the 400ms moment has been reached at this moment;
  • step 101 and step 102 have been executed 10 times in total. Assuming that the candidate face image set stores three face images 2, 5 and 9, the target face image is selected from the three face images and pushed.
  • the electronic device will also use the 2s moment as the new initial moment, and start a new round of update of the candidate face image set after clearing the candidate face image set.
  • the process is similar to the previous one, and will not be repeated here. .
  • a trace parameter trace_num can be introduced to judge whether the active push opportunity is reached, wherein the trace_num is initialized to 0. Its process is specifically: for a people's face ID, when detecting the people's face image that belongs to this people's face ID in current video frame at every turn, just add 1 to the numerical value of this trace_num, and carry out steps 101 and 102; In step 102 Finally, use the formula whether trace_num%update_num is 0 to judge whether the timing of the active push is satisfied: if trace_num%update_num is 0, the timing of the active push has been reached; otherwise, if trace_num%update_num is not 0, the timing of the active push has not been reached Opportunity, return to the current video frame in the detection scene video, and when it is detected that there is a face image belonging to the face ID, update trace_num, and perform subsequent steps 101 and 102, which will not be repeated here.
  • the value of update_num is determined according to the total time spent in step 101 and step 102 (that is, the total time required to execute step 101 and step 102 ) and a preset time interval.
  • the preset time interval as T
  • the total time spent in step 101 and step 102 as t
  • the update_num is the ratio of T to t. For example, if T is 2s and t is 200ms, update_num is 10. for example:
  • the face image 1 belonging to the face ID1 is detected for the first time, and the trace_num is updated to 1; through steps 101 and 102, it is determined that the face image 1 cannot be stored in the candidate face image set of the face ID1, and trace_num%update_num is not 0, and the timing of active push has not been reached. This is time t1, and detects whether there is a face image belonging to face ID1 at time t1;
  • trace_num%update_num 0
  • trace_num%update_num 0
  • the target face image can be found out from the candidate face image collection and pushed. It should be noted that after each push, there is no need to clear trace_num; that is, trace_num can be continuously accumulated.
  • each face ID corresponds to its own trace_num. If the face image is lost for a certain face ID, the trace_num corresponding to the face ID can be deleted to release the memory.
  • the timing of passive push can be: when the face image belonging to the face ID is lost in the scene video; that is, when the face image belonging to the face ID can no longer be detected in the scene video When there is no face image, it is considered lost. This is generally caused by the fact that the user represented by the face ID has walked out of the camera's shooting field of view, causing the camera to no longer be able to capture the user.
  • the passive push timing is actually to make up for the deficiency of the active push timing. The following scenario can be imagined: assuming that the time interval set in the active push timing is 2s, that is, the target face image is searched and pushed every 2s.
  • the face image 1 belonging to the face ID1 is detected for the first time, and the trace_num is updated to 1; through steps 101 and 102, it is determined that the face image 1 cannot be stored in the candidate face image set of the face ID1, and trace_num%update_num is not 0, and the timing of active push has not been reached.
  • the time reaches time t1, and detects whether there is a face image belonging to face ID1 at time t1;
  • trace_num is still 2 and will not increase any more, which will cause trace_num% update_num can no longer be 0; in the scenario where only the active push timing is set, the target face image of the face ID cannot be pushed; and in the scenario where the passive push timing is added, it can be based on the Passive push timing, search and push the target face image in the candidate face image collection of the face ID.
  • the electronic device can clear the data related to the face ID, such as deleting the face ID candidate face image collection, etc., to release resource.
  • the concept of tracker can also be introduced. That is, the face image belonging to the new face ID appears in the detected video frame (the face image of the new face ID refers to the face image that has not appeared in several consecutive frames of video frames before the video frame. Face ID face image), create a corresponding tracker for the face image. The trace_num and candidate face image sets updated based on the face ID are bound to the tracker. When the face ID is lost, search and push the target face image of the face ID based on the passive push timing, and then delete the tracker corresponding to the face ID, including deleting the trace_num and candidate bound to the tracker A collection of face images.
  • the face images belonging to the same face ID (that is, belonging to the same user) in the scene video can be evaluated, and the evaluation process is related to the quality score and the pose score of each face image. And judge whether to update the set of candidate face images based on the evaluation result, so that the face images in the set of candidate face images are all face images with better quality and better pose.
  • the electronic device will determine the target face image from the set of candidate face images to push, so that subsequent face image processing modules, such as face recognition modules or face verification modules, can be based on the pushed target face image Carry out further face image processing operations.
  • an embodiment of the present application further provides a push device.
  • the pushing device 300 includes:
  • Score acquisition unit 301 for each face ID, if the face image belonging to the above-mentioned face ID is detected in the current video frame of the scene video, then obtain the quality score and the attitude score of the above-mentioned face image;
  • Set update unit 302 for determining whether to update the candidate face image set of the above-mentioned face ID according to the quality score and the attitude score of the above-mentioned face image;
  • An image determining unit 303 configured to determine a face image from the set of candidate face images as the target face image of the face ID
  • the image pushing unit 304 is configured to push the aforementioned target face image.
  • the score acquisition unit 301 includes:
  • the quality score acquisition subunit is used to input the above-mentioned human face image into the preset first classification network to obtain the quality score of the above-mentioned human face image, wherein the above-mentioned first classification network is used for image classification of the above-mentioned human face image Classify by quality;
  • the posture score acquisition subunit is used to input the above-mentioned human face image into the preset second classification network to obtain the posture score of the above-mentioned human face image, wherein the above-mentioned second classification network includes three sub-classification networks, the three sub-classification networks
  • the classification network is respectively used to classify the pitch angle, yaw angle and roll angle of the face represented in the above face image.
  • the above set updating unit 302 includes:
  • the quality detection subunit is used to detect whether the quality score of the above-mentioned face image satisfies the preset quality score condition
  • a pose detection subunit used to detect whether the pose score of the above-mentioned face image meets the preset pose score condition
  • the set update subunit is configured to store the face image into the set of candidate face images if the quality score satisfies the quality score condition and the pose score satisfies the pose score condition.
  • the above-mentioned image determining unit 303 includes:
  • the matching score calculation subunit is used to calculate the matching score of each face image according to the quality score and the attitude score of each face image in the above-mentioned candidate face image set;
  • the target face image determination subunit is configured to determine the face image with the highest matching score as the target face image of the face ID in the set of candidate face images.
  • the image determining unit 303 is specifically configured to determine a face image from the set of candidate face images as a target face image of the face ID according to a preset interval.
  • the above-mentioned image determining unit 303 is specifically configured to determine a face image from the above-mentioned candidate face image set as the above-mentioned face when the face image belonging to the above-mentioned face ID is lost in the above-mentioned scene video ID of the target face image.
  • the above-mentioned pushing device 300 also includes:
  • a set clearing unit configured to clear the set of candidate face images after the target face image is pushed by the image pushing unit.
  • the face images belonging to the same face ID (that is, belonging to the same user) in the scene video can be evaluated, and the evaluation process is related to the quality score and the pose score of each face image. And judge whether to update the set of candidate face images based on the evaluation result, so that the face images in the set of candidate face images are all face images with better quality and better pose.
  • the electronic device will determine the target face image from the set of candidate face images to push, so that subsequent face image processing modules, such as face recognition modules or face verification modules, can be based on the pushed target face image Carry out further face image processing operations.
  • an embodiment of the present application further provides an electronic device.
  • the electronic device 4 in the embodiment of the present application includes: a memory 401, one or more processors 402 (only one is shown in Fig. 4 ) and a computer stored on the memory 401 and operable on the processor program.
  • the memory 401 is used to store software programs and units
  • the processor 402 executes various functional applications and diagnoses by running the software programs and units stored in the memory 401 to obtain resources corresponding to the above preset events.
  • the processor 402 implements the following steps by running the above-mentioned computer program stored in the memory 401:
  • the quality score and the attitude score of the above-mentioned face image determine whether to update the candidate face image collection of the above-mentioned face ID
  • the acquisition of the quality score and pose score of the above-mentioned face image includes:
  • the above-mentioned human face image is input into a preset first classification network to obtain the quality score of the above-mentioned human face image, wherein the above-mentioned first classification network is used to classify the image quality of the above-mentioned human face image;
  • the above-mentioned human face image into the preset second classification network to obtain the pose score of the above-mentioned human face image
  • the above-mentioned second classification network includes three sub-classification networks, and the above-mentioned three sub-classification networks are respectively used to classify the above-mentioned human Classify the pitch angle, yaw angle and roll angle of the face represented in the face image.
  • it is determined whether to update the candidate face image set of the above-mentioned face ID according to the quality score and the pose score of the above-mentioned face image include:
  • the above quality score satisfies the above quality score condition, and the above pose score satisfies the above pose score condition, then store the above human face image into the above candidate face image set.
  • the determination of a face image from the above-mentioned candidate face image set as the target face image of the above-mentioned face ID includes: :
  • the face image with the highest matching score is determined as the target face image of the face ID.
  • the determination of a face image from the above-mentioned candidate face image set as the target face image of the above-mentioned face ID includes: :
  • a face image is determined from the set of candidate face images as a target face image of the face ID.
  • the determination of a face image from the set of candidate face images as the target face image of the face ID includes: :
  • a face image is determined from the above-mentioned candidate face image set as the target face image of the above-mentioned face ID.
  • the so-called processor 402 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP) , Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory 401 may include read-only memory and random-access memory, and provides instructions and data to the processor 402 . Part or all of the memory 401 may also include non-volatile random access memory. For example, the memory 401 may also store information of device categories.
  • the face images belonging to the same face ID (that is, belonging to the same user) in the scene video can be evaluated, and the evaluation process is related to the quality score and the pose score of each face image. And judge whether to update the set of candidate face images based on the evaluation result, so that the face images in the set of candidate face images are all face images with better quality and better pose.
  • the electronic device will determine the target face image from the set of candidate face images to push, so that subsequent face image processing modules, such as face recognition modules or face verification modules, can be based on the pushed target face image Carry out further face image processing operations.
  • the disclosed devices and methods may be implemented in other ways.
  • the above-described system embodiments are only illustrative.
  • the division of the above-mentioned modules or units is only a logical function division.
  • multiple units or components can be combined Or it can be integrated into another system, or some features can be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the present application realizes all or part of the processes in the methods of the above-mentioned embodiments, and can also be completed by instructing associated hardware through computer programs.
  • the above-mentioned computer programs can be stored in a computer-readable storage medium, and the computer When the program is executed by the processor, the steps in the above-mentioned various method embodiments can be realized.
  • the above-mentioned computer program includes computer program code, and the above-mentioned computer program code may be in the form of source code, object code, executable file or some intermediate form.
  • the above-mentioned computer-readable storage medium may include: any entity or device capable of carrying the above-mentioned computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer-readable memory, read-only memory (ROM, Read-Only Memory ), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal, and software distribution medium, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signal telecommunication signal
  • software distribution medium etc.
  • the content contained in the above-mentioned computer-readable storage media can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction.
  • computer-readable storage media The medium does not include electrical carrier signals and telecommunication signals.

Abstract

Provided are a pushing method, a pushing apparatus, an electronic device, and a computer-readable storage medium. The method comprises: for each face ID, if face images belonging to the face ID are detected in a current video frame of a scene video, acquiring a quality score and a posture score of the face images (101); according to the quality score and the posture score of the face images, determining whether to update a candidate face image set of the face ID (102); determining that a face image from the candidate face image set is a target face image of the face ID (103); and pushing the target face image (104). By means of the solution, high-quality face images may be effectively found and pushed, and the subsequent processing efficiency and processing accuracy of the face images may be improved.

Description

一种推送方法、推送装置及电子设备A push method, push device and electronic equipment
本申请要求于2021年05月24日在中国专利局提交的、申请号为202110563530.7的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to a Chinese patent application with application number 202110563530.7 filed at the China Patent Office on May 24, 2021, the entire contents of which are incorporated herein by reference.
技术领域technical field
本申请属于图像处理技术领域,尤其涉及一种推送方法、推送装置、电子设备及计算机可读存储介质。The present application belongs to the technical field of image processing, and in particular relates to a pushing method, a pushing device, electronic equipment, and a computer-readable storage medium.
背景技术Background technique
当前,人脸特征作为一种常见的生物特征,在许多场景下得到了应用,这就要求电子设备先进行视频流的采集,再基于视频流对其中所包含的人脸图像进行进一步处理,例如进行精准的人脸识别或人脸验证。然而,在实际应用场景中,受环境等干扰因素的影响,视频流中通常会包含一定量的质量较差的人脸图像,进而影响到后续对人脸图像的高效及准确的处理。At present, face features, as a common biological feature, have been applied in many scenarios, which requires electronic devices to collect video streams first, and then further process the face images contained in them based on the video streams, for example Perform accurate face recognition or face verification. However, in practical application scenarios, affected by interference factors such as the environment, video streams usually contain a certain amount of poor-quality face images, which affects subsequent efficient and accurate processing of face images.
技术问题technical problem
本申请提供了一种推送方法、推送装置、电子设备及计算机可读存储介质,可有效的查找并推送高质量的人脸图像,提升后续对人脸图像的处理效率及处理准确度。The present application provides a push method, a push device, an electronic device, and a computer-readable storage medium, which can effectively search and push high-quality face images, and improve the subsequent processing efficiency and accuracy of face images.
技术解决方案technical solution
第一方面,本申请提供了一种推送方法,包括:In the first aspect, the present application provides a push method, including:
针对每个人脸ID,若在场景视频的当前视频帧中检测出属于上述人脸ID的人脸图像,则获取上述人脸图像的质量分数和姿态分数;For each face ID, if the face image belonging to the above-mentioned face ID is detected in the current video frame of the scene video, the quality score and the attitude score of the above-mentioned face image are obtained;
根据上述人脸图像的质量分数和姿态分数,确定是否更新上述人脸ID的候选人脸图像集合;According to the quality score and the attitude score of the above-mentioned face image, determine whether to update the candidate face image collection of the above-mentioned face ID;
从上述候选人脸图像集合中确定一张人脸图像作为上述人脸ID的目标人脸图像;Determine a face image as the target face image of the above-mentioned face ID from the above-mentioned candidate face image collection;
推送上述目标人脸图像。Push the target face image above.
第二方面,本申请提供了一种推送装置,包括:In a second aspect, the present application provides a push device, including:
分数获取单元,用于针对每个人脸ID,若在场景视频的当前视频帧中检测出属于上述人脸ID的人脸图像,则获取上述人脸图像的质量分数和姿态分数;The score acquisition unit is used for each face ID, if the face image belonging to the above-mentioned face ID is detected in the current video frame of the scene video, then the quality score and the attitude score of the above-mentioned face image are obtained;
集合更新单元,用于根据上述人脸图像的质量分数和姿态分数,确定是否更新上述人脸ID的候选人脸图像集合;A set update unit is used to determine whether to update the candidate face image set of the above-mentioned face ID according to the quality score and the attitude score of the above-mentioned face image;
图像确定单元,用于从上述候选人脸图像集合中确定一张人脸图像作为上述人脸ID的目标人脸图像;An image determination unit, configured to determine a face image from the set of candidate face images as the target face image of the face ID;
图像推送单元,用于推送上述目标人脸图像。An image pushing unit, configured to push the aforementioned target face image.
第三方面,本申请提供了一种电子设备,上述电子设备包括存储器、处理器以及存储在上述存储器中并可在上述处理器上运行的计算机程序,上述处理器执行上述计算机程序时实现如上述第一方面的方法的步骤。In a third aspect, the present application provides an electronic device. The electronic device includes a memory, a processor, and a computer program stored in the memory and that can run on the processor. When the processor executes the computer program, the above-mentioned The steps of the method of the first aspect.
第四方面,本申请提供了一种计算机可读存储介质,上述计算机可读存储介质存储有计算机程序,上述计算机程序被处理器执行时实现如上述第一方面的方法的步骤。In a fourth aspect, the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of the method in the first aspect above are implemented.
第五方面,本申请提供了一种计算机程序产品,上述计算机程序产品包括计算机程序,上述计算机程序被一个或多个处理器执行时实现如上述第一方面的方法的步骤。In a fifth aspect, the present application provides a computer program product, the computer program product includes a computer program, and when the computer program is executed by one or more processors, the steps of the method in the first aspect above are implemented.
有益效果Beneficial effect
本申请与现有技术相比存在的有益效果是:针对每个人脸ID,若在场景视频的当前视频帧中检测出属于上述人脸ID的人脸图像,则获取上述人脸图像的质量分数和姿态分数,并根据上述人脸图像的质量分数和姿态分数,确定是否更新上述人脸ID的候选人脸图像集 合,之后从上述候选人脸图像集合中确定一张人脸图像作为上述人脸ID的目标人脸图像,最后推送上述目标人脸图像。本申请方案会对场景视频中属于同一人脸ID(也即属于同一用户)的人脸图像进行评估,其评估过程与每个人脸图像的质量分数及姿态分数有关,并基于评估结果判断是否对候选人脸图像集合进行更新,使得候选人脸图像集合中的人脸图像均为质量较优且姿态较优的人脸图像。最后电子设备会从该候选人脸图像集合中确定出目标人脸图像进行推送,使得后续的人脸图像处理模块,例如人脸识别模块或人脸验证模块等可基于所推送的目标人脸图像进行进一步的人脸图像处理操作。可以理解的是,上述第二方面至第五方面的有益效果可以参见上述第一方面中的相关描述,在此不再赘述。The beneficial effect of the present application compared with the prior art is: for each face ID, if a face image belonging to the above-mentioned face ID is detected in the current video frame of the scene video, the quality score of the above-mentioned face image is obtained and pose score, and according to the quality score and pose score of the above-mentioned face image, determine whether to update the candidate face image set of the above-mentioned face ID, and then determine a face image from the above-mentioned candidate face image set as the above-mentioned face The target face image of the ID, and finally push the above target face image. This application scheme will evaluate the face images belonging to the same face ID (that is, belonging to the same user) in the scene video. The evaluation process is related to the quality score and pose score of each face image, and judges whether to The candidate face image set is updated so that the face images in the candidate face image set are all face images with better quality and better pose. Finally, the electronic device will determine the target face image from the set of candidate face images to push, so that subsequent face image processing modules, such as face recognition modules or face verification modules, can be based on the pushed target face image Carry out further face image processing operations. It can be understood that, for the beneficial effects of the above-mentioned second aspect to the fifth aspect, reference can be made to the relevant description in the above-mentioned first aspect, and details will not be repeated here.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the accompanying drawings that need to be used in the descriptions of the embodiments or the prior art will be briefly introduced below. Obviously, the accompanying drawings in the following description are only for the present application For some embodiments, those of ordinary skill in the art can also obtain other drawings based on these drawings without any creative effort.
图1是本申请实施例提供的推送方法的实现流程示意图;FIG. 1 is a schematic diagram of the implementation flow of the push method provided by the embodiment of the present application;
图2是本申请实施例提供的针对姿态分数的由粗到细的网络架构的示意图;Fig. 2 is a schematic diagram of a coarse-to-fine network architecture for pose scores provided by the embodiment of the present application;
图3是本申请实施例提供的推送装置的结构框图;Fig. 3 is a structural block diagram of a push device provided by an embodiment of the present application;
图4是本申请实施例提供的电子设备的结构示意图。FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
本发明的实施方式Embodiments of the present invention
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, specific details such as specific system structures and technologies are presented for the purpose of illustration rather than limitation, so as to thoroughly understand the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
为了说明本申请所提出的技术方案,下面通过具体实施例来进行说明。In order to illustrate the technical solution proposed by the present application, specific examples will be used below to illustrate.
下面对本申请实施例所提出的推送方法作出说明。请参阅图1,该推送方法包括:The push method proposed in the embodiment of the present application is described below. Please refer to Figure 1, the push method includes:
步骤101,针对每个人脸ID,若在场景视频的当前视频帧中检测出属于上述人脸ID的人脸图像,则获取上述人脸图像的质量分数和姿态分数。 Step 101, for each face ID, if a face image belonging to the above-mentioned face ID is detected in the current video frame of the scene video, the quality score and pose score of the above-mentioned face image are obtained.
在本申请实施例中,电子设备可集成有摄像头,通过该摄像头对指定的区域进行拍摄,即可获得场景视频;或者,电子设备也可以与具有摄像头的其它电子设备相连接,由该其它电子设备通过其摄像头对指定的区域进行拍摄,并将拍摄所得的场景视频传送至该电子设备,此处不作限定。In the embodiment of the present application, the electronic device can be integrated with a camera, and the scene video can be obtained by shooting a designated area through the camera; or, the electronic device can also be connected with other electronic devices with a camera, and the other electronic device can The device shoots the specified area through its camera, and transmits the captured scene video to the electronic device, which is not limited here.
电子设备在获得场景视频后,可开始对该场景视频中的视频帧进行人脸检测。该人脸检测不同于人脸识别或人脸验证,只是将视频帧中可能包含的人脸图像检测出来。通过人脸检测,可以得到包含人脸信息的人脸框,以及该人脸信息的五个关键点。考虑到人脸框的大小,以及用户可能出现的侧脸等姿态,在获得人脸框及关键点后,还需要进行预处理操作实现该人脸框的对齐,才能得到最后的人脸图像。本申请实施例使用相似变换(SimilarTransform)的方法把这五个关键点分别映射到指定坐标点上,例如:After the electronic device obtains the scene video, it can start to perform face detection on video frames in the scene video. This face detection is different from face recognition or face verification, and only detects the face images that may be included in the video frame. Through face detection, a face frame containing face information and five key points of the face information can be obtained. Considering the size of the face frame and the user's possible side faces and other postures, after obtaining the face frame and key points, it is necessary to perform preprocessing operations to achieve the alignment of the face frame in order to obtain the final face image. The embodiment of the present application uses the method of similar transformation (SimilarTransform) to map these five key points to specified coordinate points respectively, for example:
(38.2946*0.5714,51.6963*0.5714),对应左眼关键点;(38.2946*0.5714, 51.6963*0.5714), corresponding to the key points of the left eye;
(73.5318*0.5714,51.5014*0.5714),对应右眼关键点;(73.5318*0.5714, 51.5014*0.5714), corresponding to the key points of the right eye;
(56.0252*0.5714,71.7366*0.5714),对应鼻子关键点;(56.0252*0.5714,71.7366*0.5714), corresponding to the key points of the nose;
(41.5493*0.5714,92.3655*0.5714),对应左嘴角关键点;(41.5493*0.5714, 92.3655*0.5714), corresponding to the key point of the left mouth corner;
(70.7299*0.5714,92.2041*0.5714),对应右嘴角关键点。(70.7299*0.5714, 92.2041*0.5714), corresponding to the key point of the right mouth corner.
由此,即可通过人脸检测得到人脸图像。显然,所检测出的人脸图像必然是某一用户(也即某一人)的人脸图像;基于此,在检测出一个新的人脸图像时,可以为该人脸图像赋予一人脸ID,用来表明该人脸图像属于该人脸ID所代表的用户。可以认为,本申请实施例的人脸ID是用于区分处于同一画面(也即同一视频帧)的不同用户,并实现对各个用 户的人脸图像在场景视频中的跟踪。可以理解的是,针对任一人脸ID,都可采用步骤101-104进行该人脸ID的目标人脸图像的推送。因而,本申请实施例后文所提到的未作特殊说明的人脸ID均为同一人脸ID,以便于对各个步骤进行解释及说明。Thus, a face image can be obtained through face detection. Obviously, the detected face image must be the face image of a certain user (that is, a certain person); based on this, when a new face image is detected, a face ID can be assigned to the face image, It is used to indicate that the face image belongs to the user represented by the face ID. It can be considered that the face ID in the embodiment of the present application is used to distinguish different users who are in the same picture (that is, the same video frame), and to realize the tracking of each user's face image in the scene video. It can be understood that for any face ID, steps 101-104 can be used to push the target face image of the face ID. Therefore, the face IDs mentioned later in the embodiments of the present application that are not specifically described are all the same face ID, so as to facilitate the explanation and description of each step.
考虑到人无法进行瞬移,也即人无法在场景视频中突然的出现又突然的消失,因而在所采集到的场景视频中,通常属于该人脸ID的人脸图像会在连续的多个视频帧中出现,且这些人脸图像必然能够形成连续的轨迹。对于高算力的电子设备来说,这些电子设备可以对每个时刻下的当前视频帧进行实时的人脸检测;而对于低算力的电子设备来说,这些电子设备往往仅能周期性的对视频帧进行人脸检测,也即,每隔一段时间对当前视频帧进行一次人脸检测。只要在该场景视频的当前视频帧中检测出属于该人脸ID的人脸图像,电子设备即可去获取该人脸图像的质量分数和姿态分数。仅作为示例,该质量分数及该姿态分数可分别通过预先训练好的神经网络模型而获得。当然,也可以采用其它方式来对获取人脸图像的质量分数和姿态分数,此处不作限定。Considering that people cannot teleport, that is, people cannot suddenly appear and disappear suddenly in the scene video, so in the collected scene video, the face images that usually belong to the face ID will appear in consecutive multiple appear in the video frame, and these face images must be able to form a continuous trajectory. For electronic devices with high computing power, these electronic devices can perform real-time face detection on the current video frame at each moment; while for electronic devices with low computing power, these electronic devices can only periodically Perform face detection on the video frame, that is, perform face detection on the current video frame at intervals. As long as a face image belonging to the face ID is detected in the current video frame of the scene video, the electronic device can obtain the quality score and pose score of the face image. As an example only, the quality score and the pose score can be respectively obtained through a pre-trained neural network model. Of course, other methods may also be used to obtain the quality score and pose score of the face image, which are not limited here.
步骤102,根据上述人脸图像的质量分数和姿态分数,确定是否更新上述人脸ID的候选人脸图像集合; Step 102, determine whether to update the candidate face image set of the above-mentioned face ID according to the quality score and the pose score of the above-mentioned face image;
在本申请实施例中,电子设备在每次检测到新的人脸图像以进行人脸ID的创建时,还会同时创建该人脸ID的候选人脸图像集合。显然,该候选人脸图像集合在刚创建时为空。后续可根据每次检测到的属于该人脸ID的人脸图像的质量分数和姿态分数,来判断当前是否需要更新该候选人脸图像集合,也即当前是否需要将该人脸图像存入该候选人脸图像集合。具体地,电子设备可预先针对质量分数设定一质量分数条件,针对姿态分数设定一姿态分数条件,并以此来对人脸图像进行评判。只有人脸图像的质量分数满足该质量分数条件,且姿态分数满足该姿态分数条件时,才对该候选人脸图像集合进行更新,具体为将该人脸图像存入该候选人脸图像集合中。In the embodiment of the present application, each time the electronic device detects a new face image to create a face ID, it also simultaneously creates a set of candidate face images for the face ID. Obviously, the set of candidate face images is empty when it is first created. Follow-up can judge whether to update the candidate face image set according to the quality score and pose score of the face image belonging to the face ID detected each time, that is, whether the face image needs to be stored in the A collection of candidate face images. Specifically, the electronic device may pre-set a quality score condition for the quality score, and a gesture score condition for the gesture score, and use this to evaluate the face image. Only when the quality score of the face image satisfies the quality score condition and the pose score satisfies the pose score condition, the candidate face image set is updated, specifically, the face image is stored in the candidate face image set .
步骤103,从上述候选人脸图像集合中确定一张人脸图像作为上述人脸ID的目标人脸图像。 Step 103, determining a face image from the set of candidate face images as the target face image of the face ID.
在本申请实施例中,上述步骤101及步骤102可被反复执行;也即,只要未到达指定的推送时机,就在每次执行完步骤102后再次返回执行步骤101;一旦到达指定的推送时机,就即刻进入步骤103,从该人脸ID当前的候选人脸图像集合中确定一张人脸图像作为该人脸ID的目标人脸图像。需要注意的是,该指定的推送时机可以有一个以上。例如,可以设定一被动的推送时机,该被动的推送时机类似于中断操作,通常来说不可预测;也可以设定一主动的推送时机,该主动的推送时机通常来说可以预测,此处不对该指定的推送时机作出限定。In the embodiment of the present application, the above steps 101 and 102 can be executed repeatedly; that is, as long as the specified push timing is not reached, step 101 will be executed again after each execution of step 102; once the specified push timing is reached , just enter step 103 immediately, determine a face image as the target face image of this face ID from the current candidate face image collection of this face ID. It should be noted that there may be more than one designated push timing. For example, a passive push timing can be set, and the passive push timing is similar to an interrupt operation, which is generally unpredictable; an active push timing can also be set, and the active push timing can be predicted generally, here The specified push timing is not limited.
在一个应用场景下,该候选人脸图像集合可能为空。也即,在未到达指定的推送时机的这一段时间内,所检测到的该人脸ID的所有人脸图像均未能被存入该候选人脸图像集合中。此时,可认为场景视频中不存在该人脸ID所代表的用户的高质量人脸图像。为对该人脸ID所代表的用户的人脸图像进行改善,电子设备可输出提醒消息,用于提醒被拍摄对象(也即该用户)调整自身角度和/或位置。In an application scenario, the set of candidate face images may be empty. That is to say, within a period of time before the specified push timing is reached, all detected face images of the face ID cannot be stored in the set of candidate face images. At this point, it can be considered that there is no high-quality face image of the user represented by the face ID in the scene video. In order to improve the face image of the user represented by the face ID, the electronic device may output a reminder message to remind the subject (that is, the user) to adjust his angle and/or position.
在另一个应用场景下,该候选人脸图像集合中可能仅包含一个人脸图像。也即,在未到达指定的推送时机的这一段时间内,所检测到的该人脸ID的所有人脸图像中,只有一个人脸图像被存入了该候选人脸图像集合中。此时,电子设备没有其它选择,可以直接将该候选人脸图像集合中唯一的一个人脸图像确定为目标人脸图像。In another application scenario, the set of candidate face images may contain only one face image. That is to say, within this period of time before the specified push timing is reached, among all the detected face images of the face ID, only one face image is stored in the set of candidate face images. At this point, the electronic device has no other choice, and may directly determine the only face image in the set of candidate face images as the target face image.
在又一个应用场景下,该候选人脸图像集合可能包含有两个以上人脸图像。此时,电子设备存在选择的可能,其可以在该候选人脸图像中进行进一步筛选,以查找出该候选人脸图像集合中最优的人脸图像作为目标人脸图像。In yet another application scenario, the set of candidate face images may contain more than two face images. At this time, there is a possibility of selection by the electronic device, which can further screen the candidate face images to find the best face image in the set of candidate face images as the target face image.
步骤104,推送上述目标人脸图像。 Step 104, pushing the aforementioned target face image.
在本申请实施例中,当确定出目标人脸图像后,即可向电子设备中的其它模块,例如人脸验证模块或人脸识别模块等推送该目标人脸图像,使得该其它模块可以对目标人脸图 像这一优质的人脸图像进行处理,可一定程度上避免处理失败的情况发生。当然,也可以向其它电子设备推送该目标人脸图像以进行进一步处理,此处不作限定。在推送完该目标人脸图像后,可以将该人脸ID的候选人脸图像集合清空,也即删除该候选人脸图像集合中的所有人脸图像,并返回执行步骤101及后续步骤。In the embodiment of the present application, after the target face image is determined, the target face image can be pushed to other modules in the electronic device, such as a face verification module or a face recognition module, so that the other modules can The target face image, which is a high-quality face image, can be processed to a certain extent to avoid the occurrence of processing failures. Certainly, the target face image may also be pushed to other electronic devices for further processing, which is not limited here. After pushing the target face image, the candidate face image set of the face ID can be cleared, that is, all face images in the candidate face image set can be deleted, and return to step 101 and subsequent steps.
在一些实施例中,上述步骤101中,可通过预先训练好的神经网络模型来获取人脸图像的质量分数及姿态分数,则该步骤101可具体表现为:In some embodiments, in the above step 101, the quality score and the pose score of the face image can be obtained through a pre-trained neural network model, then this step 101 can be specifically expressed as:
A1、将上述人脸图像输入至预设的第一分类网络中,以得到上述人脸图像的质量分数,其中,上述第一分类网络用于对上述人脸图像的图像质量进行分类。A1. Input the above human face image into a preset first classification network to obtain the quality score of the above human face image, wherein the above first classification network is used to classify the image quality of the above human face image.
下面对该第一分类网络作出简单介绍:The following is a brief introduction to the first classification network:
本申请实施例中,使用轻量级的卷积神经网络(Convolutional Neural Networks,CNN),例如ShuffleNetV2来构建该第一分类网络。考虑到对于低算力的电子设备来说,ShuffleNetV2的处理速度无法满足小于50ms的要求,因而本申请实施例中,还可对该ShuffleNetV2进行改造,通过通道裁剪操作使ShuffleNetV2的通道数降低至原来的四分之一,得到ShuffleNetV2×0.25网络作为所使用的第一分类网络,并对该ShuffleNetV2×0.25进行三分类训练。三分类训练完成后的ShuffleNetV2×0.25网络即可投入应用中,得到人脸图像的三分类的结果。也即,第一分类网络实质为三分类网络。In the embodiment of the present application, a lightweight convolutional neural network (Convolutional Neural Networks, CNN), such as ShuffleNetV2, is used to construct the first classification network. Considering that for electronic devices with low computing power, the processing speed of ShuffleNetV2 cannot meet the requirement of less than 50ms, so in the embodiment of this application, the ShuffleNetV2 can also be modified, and the number of channels of ShuffleNetV2 can be reduced to the original through channel cutting operation A quarter of the ShuffleNetV2×0.25 network is obtained as the first classification network used, and the ShuffleNetV2×0.25 is trained for three classifications. After the three-classification training is completed, the ShuffleNetV2×0.25 network can be put into application, and the result of three-classification of face images can be obtained. That is, the first classification network is essentially a three-classification network.
其中,本申请实施例为该第一分类网络设定了三种类别,分别为:0,表示模糊;1,表示较清晰;2,表示清晰。通过这样定义三分类,可以帮助归类较为清晰这种中间状态,对模糊和清晰的分类准确度有进一步提升。基于此,在获得人脸图像的三分类的结果后,可对该三分类的结果进行进一步处理:Wherein, the embodiment of the present application sets three categories for the first classification network, which are: 0, which means fuzzy; 1, which means relatively clear; 2, which means clear. By defining the three classifications in this way, it can help to classify the intermediate state more clearly, and further improve the accuracy of fuzzy and clear classification. Based on this, after obtaining the results of the three classifications of face images, the results of the three classifications can be further processed:
假设赋予的质量分y={0,60,100},即模糊打0分,较清晰打60分,清晰打100分;而第一分类网络所输出的三分类的结果为人脸图像分别属于这三类的概率,记作o={o1,o2,o3},则可基于以下公式计算出人脸图像最终的质量分数Eq:Assume that the given quality score y={0,60,100}, that is, 0 points for fuzzy, 60 points for clearness, and 100 points for clearness; and the results of the three classifications output by the first classification network are that the face images belong to these three categories respectively The probability of is recorded as o={o1,o2,o3}, then the final quality score Eq of the face image can be calculated based on the following formula:
Figure PCTCN2021125407-appb-000001
Figure PCTCN2021125407-appb-000001
例如,某一人脸图像通过该第一分类模型,得到其为清晰的概率是0.9,为较清晰的概率是0.05,为模糊的概率是0.05,则可计算得到该人脸图像的质量分数Eq=0.9*100+0.05*60+0.05*0=93。实际上,在对该第一分类网络进行验证的过程中,可发现:对于清晰的输入图像,所得到的质量分数通常会分布在100的附近;对于模糊的输入图像,所得到的质量分数通常会分布在0的附近;对于较清晰的输入图像,所得到的质量分数通常会分布在60的附近,如果该输入图像的模糊程度是较清晰偏向模糊的,则该输入图像的分数会小于60,如果该输入图像的模糊程度是较清晰偏向清晰的,则该输入图像的分数会大于60。基于此,本申请实施例可以60分作为模糊度的阈值,以剔除掉不清晰的人脸图像,助力后续的人脸识别等进一步操作。For example, if a certain human face image passes through the first classification model, the probability that it is clear is 0.9, the probability that it is relatively clear is 0.05, and the probability that it is fuzzy is 0.05, then the quality score Eq= of this human face image can be calculated. 0.9*100+0.05*60+0.05*0=93. In fact, in the process of verifying the first classification network, it can be found that: for clear input images, the obtained quality scores are usually distributed around 100; for blurred input images, the obtained quality scores are usually will be distributed around 0; for a clearer input image, the resulting quality score will usually be distributed around 60, if the blur of the input image is clearer and more blurred, the score of the input image will be less than 60 , if the blur degree of the input image is clearer and tends to be clearer, the score of the input image will be greater than 60. Based on this, in the embodiment of the present application, 60 points can be used as the blur threshold to eliminate unclear face images, and facilitate further operations such as subsequent face recognition.
A2、将上述人脸图像输入至预设的第二分类网络中,以得到上述人脸图像的姿态分数,其中,上述第二分类网络包括三个子分类网络,上述三个子分类网络分别用于对上述人脸图像中所表示的人脸的俯仰角、偏航角及翻滚角进行分类。A2. Input the above-mentioned human face image into the preset second classification network to obtain the pose score of the above-mentioned human face image, wherein the above-mentioned second classification network includes three sub-classification networks, and the above-mentioned three sub-classification networks are respectively used for classification The pitch angle, yaw angle and roll angle of the face represented in the above face image are classified.
下面对该第二分类网络作出简单介绍:The following is a brief introduction to the second classification network:
本申请实施例中,涉及的姿态角包括三个,分别为俯仰角(pitch)、偏航角(yaw)及翻滚角(roll)。将这三个姿态角单独考虑,每个角度均有一独立的多分类任务;也即,这三个姿态角被视为三个单独的多分类任务。考虑到每个姿态角下的姿态分数的计算过程是相同的,因而,下面以一个姿态角为例进行说明:本申请实施例所预测的角度范围是[-99°,99°],则可以每3°划分为一类进行区分,那么每个姿态角下均包含有66个类别。例如,每个姿态角下均将[-99°,-96°)作为一类,将[-96°,-93°)作为一类,以此类推,每个姿态角下均可划分出66个类别。In the embodiment of the present application, the involved attitude angles include three, which are pitch angle (pitch), yaw angle (yaw) and roll angle (roll). Considering these three attitude angles separately, each angle has an independent multi-classification task; that is, the three attitude angles are considered as three separate multi-classification tasks. Considering that the calculation process of the attitude score under each attitude angle is the same, therefore, the following uses an attitude angle as an example to illustrate: the angle range predicted by the embodiment of the present application is [-99°, 99°], then it can be Every 3° is divided into one category to distinguish, then each attitude angle contains 66 categories. For example, under each attitude angle, [-99°, -96°) is regarded as a class, [-96°, -93°) is regarded as a class, and so on, each attitude angle can be divided into 66 categories.
对于第二分类网络所输出的分类结果而言,可以理解成某一人脸图像的姿态角度粗略地进入了其下的某个角度区间,其中所导致的误差约为3°。为了精细化分类结果,本申请实施例采用了Deep Expectation(DEX)的思想。DEX最先源于年龄估计,本申请实施例将其迁移到姿态估计上。假设定义的角度分为y,y实质上是个66维的标签向量,y={-99,-96,...,-3,3,...,96,99}。第二分类网络的输出为人脸图像的姿态角分别属于这66类的概率,记作o={o 1,o 2,o 3......,o 64,o 65,o 66},则可基于以下公式计算出人脸图像在每个姿态角下的姿态分数Ep: For the classification result output by the second classification network, it can be understood that the pose angle of a certain face image roughly enters a certain angle range below it, and the resulting error is about 3°. In order to refine the classification results, the embodiment of the present application adopts the idea of Deep Expectation (DEX). DEX originated from age estimation, and the embodiment of this application migrates it to pose estimation. Assuming that the defined angle is divided into y, y is essentially a 66-dimensional label vector, y={-99,-96,...,-3,3,...,96,99}. The output of the second classification network is the probability that the attitude angles of the face image belong to these 66 categories, which are recorded as o={o 1 ,o 2 ,o 3 ......,o 64 ,o 65 ,o 66 }, Then the attitude score Ep of the face image at each attitude angle can be calculated based on the following formula:
Figure PCTCN2021125407-appb-000002
Figure PCTCN2021125407-appb-000002
该姿态分数Ep实际上表示了姿态角的角度,其仍然是粗粒度的量化结果,反应了较为粗粒度的姿态信息。对此,本申请实施例可以再引入一个回归任务来将粗粒度任务进行精细化,即把粗粒度估计的姿态角的角度(也即预测的姿态分数)和标签的姿态角的角度(也即标签的姿态分数)进行回归地精细化学习。具体地,由粗到细的网络架构设计如图2所示。从图2可知,该由粗到细的网络的损失函数可以记为L,其计算公式为:The attitude score Ep actually represents the angle of the attitude angle, which is still a coarse-grained quantization result and reflects relatively coarse-grained attitude information. In this regard, the embodiment of the present application can introduce another regression task to refine the coarse-grained task, that is, the angle of the coarse-grained estimated attitude angle (that is, the predicted attitude score) and the angle of the label's attitude angle (that is, tag pose score) for regressive refinement learning. Specifically, the network architecture design from coarse to fine is shown in FIG. 2 . As can be seen from Figure 2, the loss function of the coarse-to-fine network can be denoted as L, and its calculation formula is:
Figure PCTCN2021125407-appb-000003
Figure PCTCN2021125407-appb-000003
其中,cls为图2中交叉熵(Cross Entropy)的粗估,MSE是图2中均方差(Mean Square Error,MSE)的精估。Among them, cls is a rough estimate of Cross Entropy (Cross Entropy) in Figure 2, and MSE is a fine estimate of Mean Square Error (MSE) in Figure 2.
为了使在低算力的电子设备(例如机器人端)的部署成为可能,本申请实施例采用了轻量级的骨干网络(Backbone)来进行网络的设计,例如,该骨干网络可以是MobileNetV3_small。考虑到对于低算力的电子设备来说,MobileNetV3_small的处理速度无法满足小于50ms的要求,因而本申请实施例中,还可对该MobileNetV3_small进行改造,通过通道裁剪操作使MobileNetV3_small的通道数降低至原来的四分之一,得到MobileNetV3_small×0.25网络作为所使用的骨干网络。In order to make it possible to deploy electronic devices with low computing power (such as robots), the embodiment of the present application adopts a lightweight backbone network (Backbone) for network design. For example, the backbone network may be MobileNetV3_small. Considering that for electronic devices with low computing power, the processing speed of MobileNetV3_small cannot meet the requirement of less than 50ms, so in the embodiment of this application, the MobileNetV3_small can also be modified, and the number of channels of MobileNetV3_small can be reduced to the original by channel cutting operation A quarter of the MobileNetV3_small×0.25 network is obtained as the backbone network used.
在一些实施例中,可通过预设的质量分数条件及预设的姿态分数条件来判断人脸图像是否可被存入候选人脸图像集合,则上述步骤102可具体包括:In some embodiments, it can be judged whether the face image can be stored in the candidate face image set through the preset quality score condition and the preset pose score condition, then the above step 102 can specifically include:
B1、检测人脸图像的质量分数是否满足该质量分数条件,以及,检测人脸图像的姿态分数是否满足该姿态分数条件。B1. Detecting whether the quality score of the face image satisfies the quality score condition, and detecting whether the pose score of the face image satisfies the pose score condition.
其中,质量分数条件用于检测人脸图像的图像质量是否达到要求,也即,人脸图像是否足够清晰;姿态分数条件用于检测人脸图像的人脸姿态是否达到要求,也即,该人脸姿态是否足够正。需要注意的是,步骤B1中的这两项检测可以是先后进行,也可以是同时进行,此处不作限定。在同时进行时,只要这两项检测的任一项未能通过,则另一项检测可直接终止。Among them, the quality score condition is used to detect whether the image quality of the face image meets the requirements, that is, whether the face image is clear enough; the attitude score condition is used to detect whether the face pose of the face image meets the requirements, that is, the person Is the facial posture upright enough? It should be noted that the two tests in step B1 can be performed sequentially or simultaneously, which is not limited here. When running concurrently, if either of these two tests fails, the other test can be terminated directly.
仅作为示例,考虑到人脸图像的姿态分数实际有三个(也即俯仰角、偏航角及翻滚角各有对应的姿态分数),因而,姿态分数的检测流程相对质量分数的检测流程来说更为复杂。基于此,对于低算力的电子设备来说,可以先检测人脸图像的质量分数是否满足该质量分数条件,然后再检测人脸图像的姿态分数是否满足该姿态分数条件。As an example only, considering that there are actually three attitude scores of a face image (that is, pitch angle, yaw angle, and roll angle each have a corresponding attitude score), therefore, the detection process of the attitude score is relative to the detection process of the quality score. more complicated. Based on this, for electronic devices with low computing power, it is possible to first detect whether the quality score of the face image satisfies the quality score condition, and then detect whether the pose score of the face image satisfies the pose score condition.
具体地,质量分数条件可以是:质量分数不低于预设的质量分数阈值。基于步骤A1的说明,该质量分数阈值可以为60;也即,质量分数条件可以表示为:Eq≥60。Specifically, the quality score condition may be: the quality score is not lower than a preset quality score threshold. Based on the description of step A1, the quality score threshold may be 60; that is, the quality score condition may be expressed as: Eq≥60.
具体地,姿态分数条件可以是:俯仰角的姿态分数的绝对值小于预设的第一姿态分数阈值,且偏航角的姿态分数的绝对值小于预设的第二姿态分数阈值,且俯仰角、偏航角及翻滚角的姿态分数的绝对值之和小于预设的第三姿态分数阈值,其中,第一姿态分数阈值小于第二姿态分数阈值,且第三姿态分数阈值为第一姿态分数阈值与第二姿态分数阈值之 和。例如,第一姿态分数阈值可以是25,第二姿态分数阈值可以是40,第三姿态分数阈值可以是65;记俯仰角的姿态分数为Ep(pitch),偏航角的姿态分数为Ep(yaw),翻滚角的姿态分数为Ep(roll),则姿态分数条件可以表示为:Specifically, the attitude score condition may be: the absolute value of the attitude score of the pitch angle is less than the preset first attitude score threshold, and the absolute value of the attitude score of the yaw angle is less than the preset second attitude score threshold, and the pitch angle , the sum of the absolute values of the attitude scores of the yaw angle and the roll angle is less than the preset third attitude score threshold, wherein the first attitude score threshold is less than the second attitude score threshold, and the third attitude score threshold is the first attitude score The sum of the threshold and the second pose score threshold. For example, the first attitude score threshold can be 25, the second attitude score threshold can be 40, and the third attitude score threshold can be 65; the attitude score of the pitch angle is Ep(pitch), and the attitude score of the yaw angle is Ep( yaw), the attitude score of the roll angle is Ep(roll), then the attitude score condition can be expressed as:
|Ep(pitch)|<25&&|Ep(yaw)|<40&&|Ep(pitch)|+|Ep(yaw)|+|Ep(roll)|<65|Ep(pitch)|<25&&|Ep(yaw)|<40&&|Ep(pitch)|+|Ep(yaw)|+|Ep(roll)|<65
B2、若该质量分数满足该质量分数条件,且该姿态分数满足该姿态分数条件,则将该人脸图像存入候选人脸图像集合中。B2. If the quality score satisfies the quality score condition and the pose score satisfies the pose score condition, store the face image into the candidate face image set.
若某一人脸图像的质量分数满足该质量分数条件,且其姿态分数满足该姿态分数条件,则当前可对该人脸图像所属的人脸ID的候选人脸图像集合进行更新,具体为将该人脸图像存入该候选人脸图像集合中。If the quality score of a certain face image satisfies the quality score condition, and its pose score satisfies the pose score condition, then the candidate face image set of the face ID to which the face image belongs can be updated at present, specifically the The face images are stored in the set of candidate face images.
在一些实施例中,在候选人脸图像集合中有两个以上人脸图像的情况下,需要进行进一步筛选,才能获得目标人脸图像,则上述步骤103可具体包括:In some embodiments, if there are more than two human face images in the candidate face image collection, further screening is required to obtain the target human face image, then the above step 103 may specifically include:
C1、根据候选人脸图像集合中各个人脸图像的质量分数及姿态分数,计算各个人脸图像的匹配分数。C1. Calculate the matching score of each face image according to the quality score and pose score of each face image in the candidate face image set.
其中,该匹配分数match_score的计算公式为:Wherein, the calculation formula of the matching score match_score is:
Figure PCTCN2021125407-appb-000004
Figure PCTCN2021125407-appb-000004
该计算公式的设计思路为:同时关注人脸质量和人脸姿态,由于人脸姿态是越小越好的,因此设计为负相关;步骤B1中已限定了存入候选人脸图像集合中的人脸图像的3个姿态角的姿态分数的绝对值之和不超过65°,所以做出上述式子的设计。另外,对于这三个姿态角来说,本申请实施例希望更侧重于关注俯仰角和偏航角,所以对翻滚角的姿态分数的绝对值做除法,以使得翻滚角的权重减小。The design idea of this calculation formula is: pay attention to the quality of the face and the pose of the face at the same time. Since the face pose is the smaller the better, it is designed as a negative correlation; in step B1, the number of face images stored in the candidate face image set has been limited. The sum of the absolute values of the attitude scores of the three attitude angles of the face image does not exceed 65°, so the design of the above formula is made. In addition, for the three attitude angles, the embodiment of the present application hopes to pay more attention to the pitch angle and the yaw angle, so the absolute value of the attitude score of the roll angle is divided to reduce the weight of the roll angle.
C2、在候选人脸图像集合中,将匹配分数最高的人脸图像确定为该人脸ID的目标人脸图像。C2. In the set of candidate face images, determine the face image with the highest matching score as the target face image of the face ID.
在通过C1所示出的公式对候选人脸图像集合中的每个人脸图像都进行了匹配分数的计算后,即可筛选得到该候选人脸图像集合中匹配分数最高的人脸图像。该人脸图像即可被确定为该人脸ID的目标人脸图像。After the matching score is calculated for each face image in the candidate face image set by the formula shown in C1, the face image with the highest matching score in the candidate face image set can be obtained by screening. The face image can be determined as the target face image of the face ID.
在一些实施例中,主动的推送时机可以为:距离上次推送已经过了预设的时间间隔之时;该主动的推送时机也可以理解为,距离上次推送已经对场景视频进行了预设次数的人脸检测。也即,周期性从该人脸ID的候选人脸图像集合中查找出目标人脸图像进行推送。对于低算力的电子设备来说,假定执行一次步骤101及步骤102所耗费的时间是固定的,例如为200毫秒(ms);假定初始时刻从0开始;假定预设的时间间隔为2s,则可以想象如下场景:In some embodiments, the active push timing can be: when the preset time interval has passed since the last push; the active push timing can also be understood as when the scene video has been preset since the last push times of face detection. That is, the target face image is periodically found out from the set of candidate face images of the face ID and pushed. For electronic devices with low computing power, it is assumed that the time it takes to execute step 101 and step 102 once is fixed, for example, 200 milliseconds (ms); it is assumed that the initial time starts from 0; it is assumed that the preset time interval is 2s, Then you can imagine the following scenario:
初始时刻,检测出属于人脸ID1的人脸图像1;通过步骤101及102,确定该人脸图像1不可被存入人脸ID1的候选人脸图像集合,此时已到达200ms时刻;At the initial moment, the human face image 1 belonging to the human face ID1 is detected; by steps 101 and 102, it is determined that the human face image 1 cannot be stored in the candidate face image collection of the human face ID1, and the 200ms moment has been reached at this moment;
200ms时刻,检测出属于人脸ID1的人脸图像2;通过步骤101及102,确定该人脸图像2可被存入人脸ID1的候选人脸图像集合,此时已到达400ms时刻;At 200ms, the face image 2 belonging to the face ID1 is detected; through steps 101 and 102, it is determined that the face image 2 can be stored in the candidate face image collection of the face ID1, and the 400ms moment has been reached at this moment;
以此类推,直至到2s时刻时,可知目前满足主动的推送时机。显然,目前对步骤101及步骤102共循环执行了10次。假定候选人脸图像集合存储了人脸图像2、5及9这三张,则从这三张人脸图像中选定出目标人脸图像进行推送。By analogy, until the moment of 2s, it can be seen that the current push timing is satisfactory. Apparently, step 101 and step 102 have been executed 10 times in total. Assuming that the candidate face image set stores three face images 2, 5 and 9, the target face image is selected from the three face images and pushed.
同时,电子设备还会以该2s时刻为新的初始时刻,在清空候选人脸图像集合后,开始新一轮次的候选人脸图像集合的更新,其过程与前文类似,此处不再赘述。At the same time, the electronic device will also use the 2s moment as the new initial moment, and start a new round of update of the candidate face image set after clearing the candidate face image set. The process is similar to the previous one, and will not be repeated here. .
在一些实施例中,对于低算力的电子设备来说,可引入一轨迹参数trace_num,用来评判当前是否到达主动的推送时机,其中,该trace_num初始化为0。其过程具体为:对于一人脸ID,每次在当前的视频帧中检测到属于该人脸ID的人脸图像时,就将该trace_num的数值加1,并执行步骤101及102;在步骤102后,通过公式trace_num%update_num是 否为0进行是否满足主动的推送时机的判断:若trace_num%update_num为0,则到达主动的推送时机;反之,若trace_num%update_num不为0,则未到达主动的推送时机,返回检测场景视频中当前的视频帧,并在检测到存在属于该人脸ID的人脸图像时,更新trace_num,并执行后续步骤101及102,此处不再赘述。其中,该update_num的值根据步骤101与步骤102所总共耗费的时间(也即执行步骤101及步骤102所总共需要的时间)及预设的时间间隔而决定。记预设的时间间隔为T,记步骤101与步骤102所总共耗费的时间为t,该update_num即为T与t的比值。例如,T为2s,t为200ms,则update_num为10。举例来说:In some embodiments, for electronic devices with low computing power, a trace parameter trace_num can be introduced to judge whether the active push opportunity is reached, wherein the trace_num is initialized to 0. Its process is specifically: for a people's face ID, when detecting the people's face image that belongs to this people's face ID in current video frame at every turn, just add 1 to the numerical value of this trace_num, and carry out steps 101 and 102; In step 102 Finally, use the formula whether trace_num%update_num is 0 to judge whether the timing of the active push is satisfied: if trace_num%update_num is 0, the timing of the active push has been reached; otherwise, if trace_num%update_num is not 0, the timing of the active push has not been reached Opportunity, return to the current video frame in the detection scene video, and when it is detected that there is a face image belonging to the face ID, update trace_num, and perform subsequent steps 101 and 102, which will not be repeated here. Wherein, the value of update_num is determined according to the total time spent in step 101 and step 102 (that is, the total time required to execute step 101 and step 102 ) and a preset time interval. Denote the preset time interval as T, denote the total time spent in step 101 and step 102 as t, and the update_num is the ratio of T to t. For example, if T is 2s and t is 200ms, update_num is 10. for example:
t0时刻,第一次检测出属于人脸ID1的人脸图像1,trace_num更新为1;通过步骤101及102,确定该人脸图像1不可被存入人脸ID1的候选人脸图像集合,且trace_num%update_num不为0,未到达主动的推送时机,此时为t1时刻,检测t1时刻是否存在属于人脸ID1的人脸图像;At t0 moment, the face image 1 belonging to the face ID1 is detected for the first time, and the trace_num is updated to 1; through steps 101 and 102, it is determined that the face image 1 cannot be stored in the candidate face image set of the face ID1, and trace_num%update_num is not 0, and the timing of active push has not been reached. This is time t1, and detects whether there is a face image belonging to face ID1 at time t1;
t1时刻,第二次检测出属于人脸ID1的人脸图像2,trace_num更新为2;通过步骤101及102,确定该人脸图像2可被存入人脸ID1的候选人脸图像集合,且trace_num%update_num不为0,未到达主动的推送时机,此时为t2时刻,检测t2时刻是否存在属于人脸ID1的人脸图像;At t1 moment, detect the face image 2 belonging to face ID1 for the second time, and trace_num is updated to 2; through steps 101 and 102, it is determined that this face image 2 can be stored in the candidate face image set of face ID1, and trace_num%update_num is not 0, and the active push timing has not been reached, and it is time t2, and detects whether there is a face image belonging to face ID1 at time t2;
以此类推,直至trace_num%update_num为0时,确认当前已满足主动的推送时机。此时,可以从候选人脸图像集合中查找出目标人脸图像进行推送。需要注意的是,每次推送后,无需对trace_num进行清空;也即trace_num可不断进行累加。并且,每个人脸ID均对应有自己的trace_num。若针对某一人脸ID,出现人脸图像跟丢的情况,则可删除该人脸ID所对应的trace_num,以释放内存。By analogy, until trace_num%update_num is 0, it is confirmed that the current active push timing has been met. At this point, the target face image can be found out from the candidate face image collection and pushed. It should be noted that after each push, there is no need to clear trace_num; that is, trace_num can be continuously accumulated. Moreover, each face ID corresponds to its own trace_num. If the face image is lost for a certain face ID, the trace_num corresponding to the face ID can be deleted to release the memory.
在一些实施例中,被动的推送时机可以为:属于该人脸ID的人脸图像在上述场景视频中跟丢之时;也即,当无法再在场景视频中检测到属于该人脸ID的人脸图像时,即认为跟丢。这一般由该人脸ID所表示的用户走出了摄像头的拍摄视场角,导致摄像头无法再拍摄到该用户的情况所导致。该被动的推送时机实际上是为了补足主动的推送时机的不足之处。可以想象如下场景:假定主动的推送时机中所设定的时间间隔为2s,也即每隔2s进行一次目标人脸图像的查找与推送。假定一用户非常快速地走过摄像头的拍摄区域,导致摄像头仅在不足2秒的时间内拍摄到了该用户,使得场景视频中存在该用户的人脸图像的视频帧非常少。由于该用户的人脸图像在场景视频中的存续时间不足,导致基于主动的推送时机不会推送该用户的人脸图像,这一过程显然是不够合理的。基于此,才提出了被动的推送时机,使得只要某一人脸ID(代表某一用户)的人脸图像在场景视频中跟丢,就立刻查找并推送该人脸ID的目标人脸图像,避免出现漏检的情况。举例来说,某一用户快速走过摄像头的拍摄区域,该用户对应人脸ID1,则:In some embodiments, the timing of passive push can be: when the face image belonging to the face ID is lost in the scene video; that is, when the face image belonging to the face ID can no longer be detected in the scene video When there is no face image, it is considered lost. This is generally caused by the fact that the user represented by the face ID has walked out of the camera's shooting field of view, causing the camera to no longer be able to capture the user. The passive push timing is actually to make up for the deficiency of the active push timing. The following scenario can be imagined: assuming that the time interval set in the active push timing is 2s, that is, the target face image is searched and pushed every 2s. Assume that a user walks through the shooting area of the camera very quickly, causing the camera to capture the user in less than 2 seconds, so that there are very few video frames of the user's face image in the scene video. Due to the insufficient duration of the user's face image in the scene video, the user's face image will not be pushed based on the timing of active push. This process is obviously unreasonable. Based on this, a passive push timing is proposed, so that as long as the face image of a certain face ID (representing a certain user) is lost in the scene video, the target face image of the face ID is immediately searched and pushed to avoid Occurrence of missed detection. For example, if a user quickly walks through the shooting area of the camera, and the user corresponds to face ID1, then:
t0时刻,第一次检测出属于人脸ID1的人脸图像1,trace_num更新为1;通过步骤101及102,确定该人脸图像1不可被存入人脸ID1的候选人脸图像集合,且trace_num%update_num不为0,未到达主动的推送时机,此时时间到达t1时刻,检测t1时刻是否存在属于人脸ID1的人脸图像;At t0 moment, the face image 1 belonging to the face ID1 is detected for the first time, and the trace_num is updated to 1; through steps 101 and 102, it is determined that the face image 1 cannot be stored in the candidate face image set of the face ID1, and trace_num%update_num is not 0, and the timing of active push has not been reached. At this time, the time reaches time t1, and detects whether there is a face image belonging to face ID1 at time t1;
t1时刻,第二次检测出属于人脸ID1的人脸图像2,trace_num更新为2;通过步骤101及102,确定该人脸图像1可被存入人脸ID1的候选人脸图像集合,且trace_num%update_num不为0,未到达主动的推送时机,此时时间到达t2时刻,检测t2时刻是否存在属于人脸ID1的人脸图像;At t1 moment, detect the face image 2 belonging to face ID1 for the second time, and trace_num is updated to 2; through steps 101 and 102, it is determined that this face image 1 can be stored in the candidate face image set of face ID1, and trace_num%update_num is not 0, and the timing of active push has not been reached. At this time, the time reaches time t2, and detects whether there is a face image belonging to face ID1 at time t2;
t2时刻,未能检测出属于人脸ID1的人脸图像;也即,对属于该人脸ID的人脸图像跟丢,此时trace_num仍为2,且不会再增加,这会导致trace_num%update_num不可能再为0;在仅设定了主动的推送时机的场景下,无法推送该人脸ID的目标人脸图像;而在增加设定了被动的推送时机的场景下,则可基于该被动的推送时机,在该人脸ID的候选人脸图像集合中查找并推送目标人脸图像。At time t2, the face image belonging to face ID1 cannot be detected; that is, the face image belonging to this face ID is lost. At this time, trace_num is still 2 and will not increase any more, which will cause trace_num% update_num can no longer be 0; in the scenario where only the active push timing is set, the target face image of the face ID cannot be pushed; and in the scenario where the passive push timing is added, it can be based on the Passive push timing, search and push the target face image in the candidate face image collection of the face ID.
需要注意的是,在满足该被动的推送时机并执行了步骤103及104之后,电子设备可以清空与该人脸ID相关的数据,例如删除该人脸ID的候选人脸图像集合等,以释放资源。在一些实施例中,还可引入轨迹标识tracker的概念。也即,在检测到视频帧中出现属于新的人脸ID的人脸图像(该新的人脸ID的人脸图像指的是在该视频帧之前已连续若干帧视频帧中未出现过的人脸ID的人脸图像)时,为该人脸图像创建对应的tracker。后续基于该人脸ID所更新的trace_num及候选人脸图像集合均与该tracker绑定。当对该人脸ID跟丢时,基于被动的推送时机查找并推送该人脸ID的目标人脸图像,之后删除该人脸ID所对应的tracker,包括删除与该tracker绑定的trace_num及候选人脸图像集合。It should be noted that after the passive push timing is met and steps 103 and 104 are executed, the electronic device can clear the data related to the face ID, such as deleting the face ID candidate face image collection, etc., to release resource. In some embodiments, the concept of tracker can also be introduced. That is, the face image belonging to the new face ID appears in the detected video frame (the face image of the new face ID refers to the face image that has not appeared in several consecutive frames of video frames before the video frame. Face ID face image), create a corresponding tracker for the face image. The trace_num and candidate face image sets updated based on the face ID are bound to the tracker. When the face ID is lost, search and push the target face image of the face ID based on the passive push timing, and then delete the tracker corresponding to the face ID, including deleting the trace_num and candidate bound to the tracker A collection of face images.
由上可见,通过本申请实施例,可对场景视频中属于同一人脸ID(也即属于同一用户)的人脸图像进行评估,其评估过程与每个人脸图像的质量分数及姿态分数有关,并基于评估结果判断是否对候选人脸图像集合进行更新,使得候选人脸图像集合中的人脸图像均为质量较优且姿态较优的人脸图像。最后电子设备会从该候选人脸图像集合中确定出目标人脸图像进行推送,使得后续的人脸图像处理模块,例如人脸识别模块或人脸验证模块等可基于所推送的目标人脸图像进行进一步的人脸图像处理操作。As can be seen from the above, through the embodiment of the present application, the face images belonging to the same face ID (that is, belonging to the same user) in the scene video can be evaluated, and the evaluation process is related to the quality score and the pose score of each face image. And judge whether to update the set of candidate face images based on the evaluation result, so that the face images in the set of candidate face images are all face images with better quality and better pose. Finally, the electronic device will determine the target face image from the set of candidate face images to push, so that subsequent face image processing modules, such as face recognition modules or face verification modules, can be based on the pushed target face image Carry out further face image processing operations.
对应于上文所提供的推送方法,本申请实施例还提供了一种推送装置。如图3所示,该推送装置300包括:Corresponding to the push method provided above, an embodiment of the present application further provides a push device. As shown in Figure 3, the pushing device 300 includes:
分数获取单元301,用于针对每个人脸ID,若在场景视频的当前视频帧中检测出属于上述人脸ID的人脸图像,则获取上述人脸图像的质量分数和姿态分数; Score acquisition unit 301, for each face ID, if the face image belonging to the above-mentioned face ID is detected in the current video frame of the scene video, then obtain the quality score and the attitude score of the above-mentioned face image;
集合更新单元302,用于根据上述人脸图像的质量分数和姿态分数,确定是否更新上述人脸ID的候选人脸图像集合; Set update unit 302, for determining whether to update the candidate face image set of the above-mentioned face ID according to the quality score and the attitude score of the above-mentioned face image;
图像确定单元303,用于从上述候选人脸图像集合中确定一张人脸图像作为上述人脸ID的目标人脸图像;An image determining unit 303, configured to determine a face image from the set of candidate face images as the target face image of the face ID;
图像推送单元304,用于推送上述目标人脸图像。The image pushing unit 304 is configured to push the aforementioned target face image.
可选地,上述分数获取单元301,包括:Optionally, the score acquisition unit 301 includes:
质量分数获取子单元,用于将上述人脸图像输入至预设的第一分类网络中,以得到上述人脸图像的质量分数,其中,上述第一分类网络用于对上述人脸图像的图像质量进行分类;The quality score acquisition subunit is used to input the above-mentioned human face image into the preset first classification network to obtain the quality score of the above-mentioned human face image, wherein the above-mentioned first classification network is used for image classification of the above-mentioned human face image Classify by quality;
姿态分数获取子单元,用于将上述人脸图像输入至预设的第二分类网络中,以得到上述人脸图像的姿态分数,其中,上述第二分类网络包括三个子分类网络,上述三个子分类网络分别用于对上述人脸图像中所表示的人脸的俯仰角、偏航角及翻滚角进行分类。The posture score acquisition subunit is used to input the above-mentioned human face image into the preset second classification network to obtain the posture score of the above-mentioned human face image, wherein the above-mentioned second classification network includes three sub-classification networks, the three sub-classification networks The classification network is respectively used to classify the pitch angle, yaw angle and roll angle of the face represented in the above face image.
可选地,上述集合更新单元302,包括:Optionally, the above set updating unit 302 includes:
质量检测子单元,用于检测上述人脸图像的质量分数是否满足预设的质量分数条件;The quality detection subunit is used to detect whether the quality score of the above-mentioned face image satisfies the preset quality score condition;
姿态检测子单元,用于检测上述人脸图像的姿态分数是否满足预设的姿态分数条件;A pose detection subunit, used to detect whether the pose score of the above-mentioned face image meets the preset pose score condition;
集合更新子单元,用于若上述质量分数满足上述质量分数条件,且上述姿态分数满足上述姿态分数条件,则将上述人脸图像存入上述候选人脸图像集合中。The set update subunit is configured to store the face image into the set of candidate face images if the quality score satisfies the quality score condition and the pose score satisfies the pose score condition.
可选地,上述图像确定单元303,包括:Optionally, the above-mentioned image determining unit 303 includes:
匹配分数计算子单元,用于根据上述候选人脸图像集合中各个人脸图像的质量分数及姿态分数,计算各个人脸图像的匹配分数;The matching score calculation subunit is used to calculate the matching score of each face image according to the quality score and the attitude score of each face image in the above-mentioned candidate face image set;
目标人脸图像确定子单元,用于在上述候选人脸图像集合中,将匹配分数最高的人脸图像确定为上述人脸ID的目标人脸图像。The target face image determination subunit is configured to determine the face image with the highest matching score as the target face image of the face ID in the set of candidate face images.
可选地,上述图像确定单元303,具体用于按照预设的间隔时长,从上述候选人脸图像集合中确定一张人脸图像作为上述人脸ID的目标人脸图像。Optionally, the image determining unit 303 is specifically configured to determine a face image from the set of candidate face images as a target face image of the face ID according to a preset interval.
可选地,上述图像确定单元303,具体用于当属于上述人脸ID的人脸图像在上述场景视频中跟丢时,从上述候选人脸图像集合中确定一张人脸图像作为上述人脸ID的目标人脸图像。Optionally, the above-mentioned image determining unit 303 is specifically configured to determine a face image from the above-mentioned candidate face image set as the above-mentioned face when the face image belonging to the above-mentioned face ID is lost in the above-mentioned scene video ID of the target face image.
可选地,上述推送装置300还包括:Optionally, the above-mentioned pushing device 300 also includes:
集合清空单元,用于在上述图像推送单元推送上述目标人脸图像之后,清空上述候选人脸图像集合。A set clearing unit, configured to clear the set of candidate face images after the target face image is pushed by the image pushing unit.
由上可见,通过本申请实施例,可对场景视频中属于同一人脸ID(也即属于同一用户)的人脸图像进行评估,其评估过程与每个人脸图像的质量分数及姿态分数有关,并基于评估结果判断是否对候选人脸图像集合进行更新,使得候选人脸图像集合中的人脸图像均为质量较优且姿态较优的人脸图像。最后电子设备会从该候选人脸图像集合中确定出目标人脸图像进行推送,使得后续的人脸图像处理模块,例如人脸识别模块或人脸验证模块等可基于所推送的目标人脸图像进行进一步的人脸图像处理操作。As can be seen from the above, through the embodiment of the present application, the face images belonging to the same face ID (that is, belonging to the same user) in the scene video can be evaluated, and the evaluation process is related to the quality score and the pose score of each face image. And judge whether to update the set of candidate face images based on the evaluation result, so that the face images in the set of candidate face images are all face images with better quality and better pose. Finally, the electronic device will determine the target face image from the set of candidate face images to push, so that subsequent face image processing modules, such as face recognition modules or face verification modules, can be based on the pushed target face image Carry out further face image processing operations.
对应于上文所提供的推送方法,本申请实施例还提供了一种电子设备。请参阅图4,本申请实施例中的电子设备4包括:存储器401,一个或多个处理器402(图4中仅示出一个)及存储在存储器401上并可在处理器上运行的计算机程序。其中:存储器401用于存储软件程序以及单元,处理器402通过运行存储在存储器401的软件程序以及单元,从而执行各种功能应用以及诊断,以获取上述预设事件对应的资源。具体地,处理器402通过运行存储在存储器401的上述计算机程序时实现以下步骤:Corresponding to the push method provided above, an embodiment of the present application further provides an electronic device. Referring to Fig. 4, the electronic device 4 in the embodiment of the present application includes: a memory 401, one or more processors 402 (only one is shown in Fig. 4 ) and a computer stored on the memory 401 and operable on the processor program. Wherein: the memory 401 is used to store software programs and units, and the processor 402 executes various functional applications and diagnoses by running the software programs and units stored in the memory 401 to obtain resources corresponding to the above preset events. Specifically, the processor 402 implements the following steps by running the above-mentioned computer program stored in the memory 401:
针对每个人脸ID,若在场景视频的当前视频帧中检测出属于上述人脸ID的人脸图像,则获取上述人脸图像的质量分数和姿态分数;For each face ID, if the face image belonging to the above-mentioned face ID is detected in the current video frame of the scene video, the quality score and the attitude score of the above-mentioned face image are obtained;
根据上述人脸图像的质量分数和姿态分数,确定是否更新上述人脸ID的候选人脸图像集合;According to the quality score and the attitude score of the above-mentioned face image, determine whether to update the candidate face image collection of the above-mentioned face ID;
从上述候选人脸图像集合中确定一张人脸图像作为上述人脸ID的目标人脸图像;Determine a face image as the target face image of the above-mentioned face ID from the above-mentioned candidate face image collection;
推送上述目标人脸图像。Push the target face image above.
假设上述为第一种可能的实施方式,则在第一种可能的实施方式作为基础而提供的第二种可能的实施方式中,上述获取上述人脸图像的质量分数和姿态分数,包括:Assuming that the above is the first possible implementation manner, then in the second possible implementation manner provided on the basis of the first possible implementation manner, the acquisition of the quality score and pose score of the above-mentioned face image includes:
将上述人脸图像输入至预设的第一分类网络中,以得到上述人脸图像的质量分数,其中,上述第一分类网络用于对上述人脸图像的图像质量进行分类;The above-mentioned human face image is input into a preset first classification network to obtain the quality score of the above-mentioned human face image, wherein the above-mentioned first classification network is used to classify the image quality of the above-mentioned human face image;
将上述人脸图像输入至预设的第二分类网络中,以得到上述人脸图像的姿态分数,其中,上述第二分类网络包括三个子分类网络,上述三个子分类网络分别用于对上述人脸图像中所表示的人脸的俯仰角、偏航角及翻滚角进行分类。Input the above-mentioned human face image into the preset second classification network to obtain the pose score of the above-mentioned human face image, wherein the above-mentioned second classification network includes three sub-classification networks, and the above-mentioned three sub-classification networks are respectively used to classify the above-mentioned human Classify the pitch angle, yaw angle and roll angle of the face represented in the face image.
在上述第一种可能的实施方式作为基础而提供的第三种可能的实施方式中,上述根据上述人脸图像的质量分数和姿态分数,确定是否更新上述人脸ID的候选人脸图像集合,包括:In the third possible implementation manner provided on the basis of the above first possible implementation manner, it is determined whether to update the candidate face image set of the above-mentioned face ID according to the quality score and the pose score of the above-mentioned face image, include:
检测上述人脸图像的质量分数是否满足预设的质量分数条件,以及,检测上述人脸图像的姿态分数是否满足预设的姿态分数条件;Detecting whether the quality score of the above-mentioned face image satisfies a preset quality score condition, and detecting whether the pose score of the above-mentioned face image satisfies the preset pose score condition;
若上述质量分数满足上述质量分数条件,且上述姿态分数满足上述姿态分数条件,则将上述人脸图像存入上述候选人脸图像集合中。If the above quality score satisfies the above quality score condition, and the above pose score satisfies the above pose score condition, then store the above human face image into the above candidate face image set.
在上述第一种可能的实施方式作为基础而提供的第四种可能的实施方式中,上述从上述候选人脸图像集合中确定一张人脸图像作为上述人脸ID的目标人脸图像,包括:In the fourth possible implementation manner provided on the basis of the above first possible implementation manner, the determination of a face image from the above-mentioned candidate face image set as the target face image of the above-mentioned face ID includes: :
根据上述候选人脸图像集合中各个人脸图像的质量分数及姿态分数,计算各个人脸图像的匹配分数;According to the quality score and the attitude score of each face image in the above-mentioned candidate face image collection, calculate the matching score of each face image;
在上述候选人脸图像集合中,将匹配分数最高的人脸图像确定为上述人脸ID的目标人脸图像。In the set of candidate face images, the face image with the highest matching score is determined as the target face image of the face ID.
在上述第一种可能的实施方式作为基础而提供的第五种可能的实施方式中,上述从上述候选人脸图像集合中确定一张人脸图像作为上述人脸ID的目标人脸图像,包括:In the fifth possible implementation manner provided on the basis of the above first possible implementation manner, the determination of a face image from the above-mentioned candidate face image set as the target face image of the above-mentioned face ID includes: :
按照预设的间隔时长,从上述候选人脸图像集合中确定一张人脸图像作为上述人脸ID的目标人脸图像。According to a preset interval, a face image is determined from the set of candidate face images as a target face image of the face ID.
在上述第一种可能的实施方式作为基础而提供的第六种可能的实施方式中,上述从上述候选人脸图像集合中确定一张人脸图像作为上述人脸ID的目标人脸图像,包括:In the sixth possible implementation manner provided on the basis of the first possible implementation manner above, the determination of a face image from the set of candidate face images as the target face image of the face ID includes: :
当属于上述人脸ID的人脸图像在上述场景视频中跟丢时,从上述候选人脸图像集合中确定一张人脸图像作为上述人脸ID的目标人脸图像。When the face image belonging to the above-mentioned face ID is lost in the above-mentioned scene video, a face image is determined from the above-mentioned candidate face image set as the target face image of the above-mentioned face ID.
在上述第一种可能的实施方式作为基础,或者上述第二种可能的实施方式作为基础,或者上述第三种可能的实施方式作为基础,或者上述第四种可能的实施方式作为基础,或者上述第五种可能的实施方式作为基础,或者上述第六种可能的实施方式作为基础而提供的第七种可能的实施方式中,在上述推送上述目标人脸图像之后,处理器402通过运行存储在存储器401的上述计算机程序时还实现以下步骤:Based on the above-mentioned first possible implementation manner, or on the basis of the above-mentioned second possible implementation manner, or on the basis of the above-mentioned third possible implementation manner, or on the basis of the above-mentioned fourth possible implementation manner, or on the basis of the above-mentioned In the fifth possible implementation manner as a basis, or in the seventh possible implementation manner provided on the basis of the above sixth possible implementation manner, after the above-mentioned target face image is pushed, the processor 402 runs the The above-mentioned computer program of memory 401 also realizes the following steps:
清空上述候选人脸图像集合。Empty the above collection of candidate face images.
应当理解,在本申请实施例中,所称处理器402可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in the embodiment of the present application, the so-called processor 402 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP) , Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
存储器401可以包括只读存储器和随机存取存储器,并向处理器402提供指令和数据。存储器401的一部分或全部还可以包括非易失性随机存取存储器。例如,存储器401还可以存储设备类别的信息。The memory 401 may include read-only memory and random-access memory, and provides instructions and data to the processor 402 . Part or all of the memory 401 may also include non-volatile random access memory. For example, the memory 401 may also store information of device categories.
由上可见,通过本申请实施例,可对场景视频中属于同一人脸ID(也即属于同一用户)的人脸图像进行评估,其评估过程与每个人脸图像的质量分数及姿态分数有关,并基于评估结果判断是否对候选人脸图像集合进行更新,使得候选人脸图像集合中的人脸图像均为质量较优且姿态较优的人脸图像。最后电子设备会从该候选人脸图像集合中确定出目标人脸图像进行推送,使得后续的人脸图像处理模块,例如人脸识别模块或人脸验证模块等可基于所推送的目标人脸图像进行进一步的人脸图像处理操作。As can be seen from the above, through the embodiment of the present application, the face images belonging to the same face ID (that is, belonging to the same user) in the scene video can be evaluated, and the evaluation process is related to the quality score and the pose score of each face image. And judge whether to update the set of candidate face images based on the evaluation result, so that the face images in the set of candidate face images are all face images with better quality and better pose. Finally, the electronic device will determine the target face image from the set of candidate face images to push, so that subsequent face image processing modules, such as face recognition modules or face verification modules, can be based on the pushed target face image Carry out further face image processing operations.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将上述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of description, only the division of the above-mentioned functional units and modules is used for illustration. In practical applications, the above-mentioned functions can be assigned to different functional units, Module completion means that the internal structure of the above-mentioned device is divided into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment can be integrated into one processing unit, or each unit can exist separately physically, or two or more units can be integrated into one unit, and the above-mentioned integrated units can either adopt hardware It can also be implemented in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the above system, reference may be made to the corresponding process in the foregoing method embodiments, and details will not be repeated here.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the descriptions of each embodiment have their own emphases, and for parts that are not detailed or recorded in a certain embodiment, refer to the relevant descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者外部设备软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of external device software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
在本申请所提供的实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的系统实施例仅仅是示意性的,例如,上述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。In the embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the above-described system embodiments are only illustrative. For example, the division of the above-mentioned modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined Or it can be integrated into another system, or some features can be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的 部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关联的硬件来完成,上述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,上述计算机程序包括计算机程序代码,上述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。上述计算机可读存储介质可以包括:能够携带上述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机可读存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,上述计算机可读存储介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读存储介质不包括是电载波信号和电信信号。If the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the present application realizes all or part of the processes in the methods of the above-mentioned embodiments, and can also be completed by instructing associated hardware through computer programs. The above-mentioned computer programs can be stored in a computer-readable storage medium, and the computer When the program is executed by the processor, the steps in the above-mentioned various method embodiments can be realized. Wherein, the above-mentioned computer program includes computer program code, and the above-mentioned computer program code may be in the form of source code, object code, executable file or some intermediate form. The above-mentioned computer-readable storage medium may include: any entity or device capable of carrying the above-mentioned computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer-readable memory, read-only memory (ROM, Read-Only Memory ), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal, and software distribution medium, etc. It should be noted that the content contained in the above-mentioned computer-readable storage media can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, computer-readable storage media The medium does not include electrical carrier signals and telecommunication signals.
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still apply to the foregoing embodiments Modifications to the technical solutions recorded, or equivalent replacements for some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of each embodiment of the application, and should be included in this application. within the scope of protection.

Claims (10)

  1. 一种推送方法,其特征在于,包括:A push method, characterized in that, comprising:
    针对每个人脸ID,若在场景视频的当前视频帧中检测出属于所述人脸ID的人脸图像,则获取所述人脸图像的质量分数和姿态分数;For each face ID, if a face image belonging to the face ID is detected in the current video frame of the scene video, the quality score and the attitude score of the face image are obtained;
    根据所述人脸图像的质量分数和姿态分数,确定是否更新所述人脸ID的候选人脸图像集合;According to the quality score and the posture score of the human face image, determine whether to update the candidate face image set of the human face ID;
    从所述候选人脸图像集合中确定一张人脸图像作为所述人脸ID的目标人脸图像;Determine a face image as the target face image of the face ID from the set of candidate face images;
    推送所述目标人脸图像。Push the target face image.
  2. 如权利要求1所述的推送方法,其特征在于,所述获取所述人脸图像的质量分数和姿态分数,包括:The pushing method according to claim 1, wherein said obtaining the quality score and the pose score of said face image comprises:
    将所述人脸图像输入至预设的第一分类网络中,以得到所述人脸图像的质量分数,其中,所述第一分类网络用于对所述人脸图像的图像质量进行分类;The human face image is input into a preset first classification network to obtain the quality score of the human face image, wherein the first classification network is used to classify the image quality of the human face image;
    将所述人脸图像输入至预设的第二分类网络中,以得到所述人脸图像的姿态分数,其中,所述第二分类网络包括三个子分类网络,所述三个子分类网络分别用于对所述人脸图像中所表示的人脸的俯仰角、偏航角及翻滚角进行分类。The human face image is input into a preset second classification network to obtain the pose score of the human face image, wherein the second classification network includes three sub-classification networks, and the three sub-classification networks are respectively used Classify the pitch angle, yaw angle and roll angle of the face represented in the face image.
  3. 如权利要求1所述的推送方法,其特征在于,所述根据所述人脸图像的质量分数和姿态分数,确定是否更新所述人脸ID的候选人脸图像集合,包括:The pushing method according to claim 1, wherein, determining whether to update the candidate face image set of the face ID according to the quality score and the pose score of the face image comprises:
    检测所述人脸图像的质量分数是否满足预设的质量分数条件,以及,检测所述人脸图像的姿态分数是否满足预设的姿态分数条件;Detecting whether the quality score of the face image satisfies a preset quality score condition, and detecting whether the pose score of the face image satisfies a preset pose score condition;
    若所述质量分数满足所述质量分数条件,且所述姿态分数满足所述姿态分数条件,则将所述人脸图像存入所述候选人脸图像集合中。If the quality score satisfies the quality score condition and the pose score satisfies the pose score condition, then store the face image into the set of candidate face images.
  4. 如权利要求1所述的推送方法,其特征在于,所述从所述候选人脸图像集合中确定一张人脸图像作为所述人脸ID的目标人脸图像,包括:The push method according to claim 1, wherein said determining a face image as the target face image of said face ID from said candidate face image set comprises:
    根据所述候选人脸图像集合中各个人脸图像的质量分数及姿态分数,计算各个人脸图像的匹配分数;Calculate the matching score of each face image according to the quality score and the attitude score of each face image in the set of candidate face images;
    在所述候选人脸图像集合中,将匹配分数最高的人脸图像确定为所述人脸ID的目标人脸图像。In the set of candidate face images, the face image with the highest matching score is determined as the target face image of the face ID.
  5. 如权利要求1所述的推送方法,其特征在于,所述从所述候选人脸图像集合中确定一张人脸图像作为所述人脸ID的目标人脸图像,包括:The push method according to claim 1, wherein said determining a face image as the target face image of said face ID from said candidate face image set comprises:
    按照预设的间隔时长,从所述候选人脸图像集合中确定一张人脸图像作为所述人脸ID的目标人脸图像。According to a preset interval, a face image is determined from the set of candidate face images as a target face image of the face ID.
  6. 如权利要求1所述的推送方法,其特征在于,所述从所述候选人脸图像集合中确定一张人脸图像作为所述人脸ID的目标人脸图像,包括:The push method according to claim 1, wherein said determining a face image as the target face image of said face ID from said candidate face image set comprises:
    当属于所述人脸ID的人脸图像在所述场景视频中跟丢时,从所述候选人脸图像集合中确定一张人脸图像作为所述人脸ID的目标人脸图像。When a face image belonging to the face ID is lost in the scene video, a face image is determined from the set of candidate face images as a target face image of the face ID.
  7. 如权利要求1至6任一项所述的推送方法,其特征在于,在所述推送所述目标人脸图像之后,所述推送方法还包括:The push method according to any one of claims 1 to 6, wherein, after the pushing of the target face image, the push method further comprises:
    清空所述候选人脸图像集合。Empty the set of candidate face images.
  8. 一种推送装置,其特征在于,包括:A push device is characterized in that it comprises:
    分数获取单元,用于针对每个人脸ID,若在场景视频的当前视频帧中检测出属于所述人脸ID的人脸图像,则获取所述人脸图像的质量分数和姿态分数;A score acquisition unit, for each face ID, if a face image belonging to the face ID is detected in the current video frame of the scene video, then obtain the quality score and the pose score of the face image;
    集合更新单元,用于根据所述人脸图像的质量分数和姿态分数,确定是否更新所述人脸ID的候选人脸图像集合;A set update unit, used to determine whether to update the set of candidate face images of the face ID according to the quality score and the pose score of the face image;
    图像确定单元,用于从所述候选人脸图像集合中确定一张人脸图像作为所述人脸ID的目标人脸图像;An image determination unit, configured to determine a face image from the set of candidate face images as the target face image of the face ID;
    图像推送单元,用于推送所述目标人脸图像。An image pushing unit, configured to push the target face image.
  9. 一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至7任一项所述的方法。An electronic device, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, characterized in that, when the processor executes the computer program, the computer program according to claims 1 to 1 is implemented. 7. The method described in any one.
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的方法。A computer-readable storage medium storing a computer program, wherein the computer program implements the method according to any one of claims 1 to 7 when executed by a processor.
PCT/CN2021/125407 2021-05-24 2021-10-21 Pushing method, pushing apparatus and electronic device WO2022247118A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110563530.7A CN113297423A (en) 2021-05-24 2021-05-24 Pushing method, pushing device and electronic equipment
CN202110563530.7 2021-05-24

Publications (1)

Publication Number Publication Date
WO2022247118A1 true WO2022247118A1 (en) 2022-12-01

Family

ID=77324131

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/125407 WO2022247118A1 (en) 2021-05-24 2021-10-21 Pushing method, pushing apparatus and electronic device

Country Status (2)

Country Link
CN (1) CN113297423A (en)
WO (1) WO2022247118A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297423A (en) * 2021-05-24 2021-08-24 深圳市优必选科技股份有限公司 Pushing method, pushing device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753917A (en) * 2018-12-29 2019-05-14 中国科学院重庆绿色智能技术研究院 Face quality optimization method, system, computer readable storage medium and equipment
US20200050835A1 (en) * 2017-05-31 2020-02-13 Shenzhen Sensetime Technology Co., Ltd. Methods and apparatuses for determining face image quality, electronic devices, and computer storage media
CN111241927A (en) * 2019-12-30 2020-06-05 新大陆数字技术股份有限公司 Cascading type face image optimization method, system and equipment and readable storage medium
CN111652139A (en) * 2020-06-03 2020-09-11 浙江大华技术股份有限公司 Face snapshot method, snapshot device and storage device
CN111986163A (en) * 2020-07-29 2020-11-24 深思考人工智能科技(上海)有限公司 Face image selection method and device
CN112528903A (en) * 2020-12-18 2021-03-19 平安银行股份有限公司 Face image acquisition method and device, electronic equipment and medium
CN113297423A (en) * 2021-05-24 2021-08-24 深圳市优必选科技股份有限公司 Pushing method, pushing device and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034013B (en) * 2018-07-10 2023-06-13 腾讯科技(深圳)有限公司 Face image recognition method, device and storage medium
CN112084856A (en) * 2020-08-05 2020-12-15 深圳市优必选科技股份有限公司 Face posture detection method and device, terminal equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200050835A1 (en) * 2017-05-31 2020-02-13 Shenzhen Sensetime Technology Co., Ltd. Methods and apparatuses for determining face image quality, electronic devices, and computer storage media
CN109753917A (en) * 2018-12-29 2019-05-14 中国科学院重庆绿色智能技术研究院 Face quality optimization method, system, computer readable storage medium and equipment
CN111241927A (en) * 2019-12-30 2020-06-05 新大陆数字技术股份有限公司 Cascading type face image optimization method, system and equipment and readable storage medium
CN111652139A (en) * 2020-06-03 2020-09-11 浙江大华技术股份有限公司 Face snapshot method, snapshot device and storage device
CN111986163A (en) * 2020-07-29 2020-11-24 深思考人工智能科技(上海)有限公司 Face image selection method and device
CN112528903A (en) * 2020-12-18 2021-03-19 平安银行股份有限公司 Face image acquisition method and device, electronic equipment and medium
CN113297423A (en) * 2021-05-24 2021-08-24 深圳市优必选科技股份有限公司 Pushing method, pushing device and electronic equipment

Also Published As

Publication number Publication date
CN113297423A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
WO2021114892A1 (en) Environmental semantic understanding-based body movement recognition method, apparatus, device, and storage medium
WO2019174439A1 (en) Image recognition method and apparatus, and terminal and storage medium
US20210012127A1 (en) Action recognition method and apparatus, driving action analysis method and apparatus, and storage medium
US10733428B2 (en) Recognition actions on event based cameras with motion event features
JP7386545B2 (en) Method for identifying objects in images and mobile device for implementing the method
CN110633004B (en) Interaction method, device and system based on human body posture estimation
CN109635693B (en) Front face image detection method and device
CN111144284B (en) Method and device for generating depth face image, electronic equipment and medium
CN112487964B (en) Gesture detection and recognition method, gesture detection and recognition equipment and computer-readable storage medium
CN115131879B (en) Action evaluation method and device
CN112771612A (en) Method and device for shooting image
WO2022247118A1 (en) Pushing method, pushing apparatus and electronic device
CN105912126A (en) Method for adaptively adjusting gain, mapped to interface, of gesture movement
CN111553231A (en) Face snapshot and duplicate removal system, method, terminal and medium based on information fusion
CN111488774A (en) Image processing method and device for image processing
CN111563245A (en) User identity identification method, device, equipment and medium
CN110276288A (en) A kind of personal identification method and device based on biological characteristic
CN114067406A (en) Key point detection method, device, equipment and readable storage medium
Shi et al. An FPGA-based smart camera for gesture recognition in HCI applications
JP6998027B1 (en) Information processing method, information processing system, imaging device, server device and computer program
CN114816044A (en) Method and device for determining interaction gesture and electronic equipment
CN115641610A (en) Hand-waving help-seeking identification system and method
CN115546825A (en) Automatic monitoring method for safety inspection normalization
JP7270304B2 (en) Method and mobile device for implementing the method for verifying the identity of a user by identifying an object in an image that has the user&#39;s biometric characteristics
CN114816045A (en) Method and device for determining interaction gesture and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21942682

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE