CN113283305A

CN113283305A - Face recognition method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN113283305A
Application number: CN202110473605.2A
Authority: CN
Inventors: 高治力; 何建斌
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2021-08-20
Anticipated expiration: 2041-04-29
Also published as: CN113283305B

Abstract

The present disclosure provides a face recognition method, a face recognition device, an electronic device, and the like, and relates to the field of artificial intelligence such as computer vision and deep learning, and the method may include: carrying out face detection on any video frame in the continuous video frames to obtain designated face information and corresponding identification information; taking a subgraph corresponding to the designated face information as a face snapshot, and acquiring a quality score corresponding to the face snapshot; adding the face snapshot images into a buffer area corresponding to the identification information according to the quality score, wherein at most M face snapshot images are stored in the buffer area, and M is a positive integer greater than 1; and respectively retrieving the target face image in the target library by using the face snapshot images of a plurality of video frames in the continuous video frames corresponding to the buffer area, and determining a face recognition result corresponding to the designated face information according to the obtained retrieval result. By applying the scheme disclosed by the invention, the accuracy and the recall rate of face recognition can be improved.

Description

Face recognition method and device, electronic equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a face recognition method and apparatus, an electronic device, and a computer-readable storage medium in the fields of computer vision and deep learning.

Background

When face recognition is performed, a face snapshot with the optimal quality is usually selected according to a certain strategy, and a face recognition result is obtained by comparing the characteristic value of the face snapshot with the characteristic value of each target face image in the target library.

However, the strategy is usually only applicable to a specific scene, and once the scene changes, the face snapshot image with the optimal quality selected according to the strategy often has a larger difference from the actual optimal face snapshot image, so that the accuracy rate, the recall rate and the like of face recognition are influenced.

Disclosure of Invention

The disclosure provides a face recognition method, a face recognition device, an electronic device and a computer-readable storage medium.

According to an aspect of the present disclosure, there is provided a face recognition method, including:

carrying out face detection on any video frame in the continuous video frames to obtain designated face information and corresponding identification information;

taking the subgraph corresponding to the designated face information as a face snapshot, and acquiring a quality score corresponding to the face snapshot;

adding the face snapshot image into a buffer area corresponding to the identification information according to the quality score, wherein at most M face snapshot images are stored in the buffer area, and M is a positive integer greater than 1;

and retrieving the target face image in the target library by respectively utilizing the face snapshot images corresponding to a plurality of video frames in the continuous video frames in the buffer area, and determining the face recognition result corresponding to the specified face information according to the retrieved result.

According to an aspect of the present disclosure, there is provided a face recognition apparatus including: the system comprises a first processing module, a second processing module, a third processing module and a fourth processing module;

the first processing module is used for carrying out face detection on any one of the continuous video frames to obtain designated face information and corresponding identification information;

the second processing module is used for taking the subgraph corresponding to the designated face information as a face snapshot and acquiring the quality score corresponding to the face snapshot;

the third processing module is configured to add the face snapshot image to a buffer area corresponding to the identification information according to the quality score, where at most M face snapshot images are stored in the buffer area, and M is a positive integer greater than 1;

and the fourth processing module is used for retrieving the target face image in the target library by respectively utilizing the face snapshot images corresponding to a plurality of video frames in the continuous video frames in the buffer area, and determining the face recognition result corresponding to the specified face information according to the obtained retrieval result.

According to an aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as described above.

According to an aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described above.

According to an aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method as described above.

One embodiment in the above disclosure has the following advantages or benefits: a plurality of face snapshot images with better quality can be selected to respectively search the target face image, and the face recognition result can be determined by combining the obtained search results, so that the influence caused by selection error of the optimal face snapshot image is reduced, and the accuracy and the recall rate of face recognition are improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of an embodiment of a face recognition method of the present disclosure;

fig. 2 is a flowchart of an embodiment of a processing method for a face snapshot corresponding to face information b according to the present disclosure;

fig. 3 is a flowchart illustrating an embodiment of a method for determining a face recognition result of face information corresponding to identification information c according to retrieval results respectively corresponding to the face snapshot fig. 1 to 6;

fig. 4 is a schematic diagram of a structure of a face recognition apparatus 400 according to an embodiment of the present disclosure;

FIG. 5 shows a schematic block diagram of an electronic device 500 that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In addition, it should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Fig. 1 is a flowchart of an embodiment of a face recognition method according to the present disclosure. As shown in fig. 1, the following specific implementation steps are included.

In step 101, face detection is performed on any one of the consecutive video frames to obtain designated face information and corresponding identification information.

In step 102, the subgraph corresponding to the designated face information is used as a face snapshot, and a quality score corresponding to the face snapshot is obtained.

In step 103, adding the face snapshot image into a buffer area corresponding to the identification information according to the quality score, wherein at most M face snapshot images are stored in the buffer area, and M is a positive integer greater than 1.

In step 104, the face snapshot images of a plurality of video frames in the continuous video frames corresponding to the buffer area are respectively utilized to search the target face image in the target library, and the face recognition result corresponding to the designated face information is determined according to the obtained search result.

It can be seen that in the scheme of the embodiment of the method, a plurality of face snapshot images with better quality can be selected to respectively search the target face image, and the final face recognition result can be determined by combining the obtained search results, so that the influence caused by selection errors of the optimal face snapshot image is reduced, and the accuracy and the recall rate of face recognition are improved.

Alternatively, the target library may be created in advance, for example, may be created in a business application deployment stage, where each target face map and the corresponding feature value may be included. How to obtain the feature value of the target face image is not limited, for example, the feature value of the target face image may be extracted by using a feature extraction model obtained by pre-training, or the feature value of the target face image may be extracted by using a feature extraction algorithm.

The scheme disclosed by the disclosure is applicable to both real-time received video streams and offline video files, and has wide applicability.

For each video frame in a video stream received in real time or in an offline video file, the processing can be performed in the manner shown in fig. 1. For convenience of description, the currently processed video frame will be referred to as video frame a hereinafter.

Before face detection and the like are carried out on the video frame a, the video frame a can be preprocessed. The preprocessing specifically includes what contents can be determined according to actual needs, for example, the video frame a can be sent to a decoder to be decoded, and in addition, the current deep learning framework generally needs frame data in an ARGB or BGRA format, where a represents transparency (Alpha), R represents Red (Red), G represents Green (Green), and B represents Blue (Blue), and the decoded video frame is generally in a YUV format, Y represents brightness (Luminance), and U and V represent Chrominance (chroma), so that the video frame a can be further subjected to color space transformation and converted into an ARGB or BGRA format.

After the preprocessing is completed, the face detection and tracking can be performed on the video frame a, so that the face information and the corresponding identification information (id) in the video frame a are obtained. The face information can be obtained through detection, and the identification information can be obtained through tracking. For example, if the face information of a certain person appears in a plurality of video frames, the face information obtained from the plurality of video frames will correspond to the same identification information.

Assuming that three faces are detected from the video frame a, the processing can be performed in the same manner for each face information.

Specifically, for any face information (for convenience of expression, it is subsequently referred to as face information b), a sub-image corresponding to the face information b may be used as a face snapshot, for example, a face sub-image only including the face information b may be captured from a video frame a (i.e., a video image) and used as a required face snapshot, then, a quality score corresponding to the face snapshot may be obtained, and the obtained face snapshot may be added to a buffer area corresponding to identification information corresponding to the face information b according to the size of the quality score, where at most M face snapshots with the highest quality score are stored in the buffer area, where M is a positive integer greater than one, and a value may be specifically determined according to actual needs. Optionally, the quality score corresponding to the face snapshot may also be stored in a buffer, which is described below as an example.

The quality score corresponding to the face snapshot image can be determined according to the face size and the face angle in the face snapshot image.

For example, the face size and the face angle may be multiplied by the corresponding weight, and the two products are added together, and the sum is used as the obtained quality score.

Namely, the method comprises the following steps: quality ═ size a + dose ═ b; (1)

wherein quality represents quality score, a and b represent weight, 0 ═ a < ═ 1, 0 ═ b < ═ 1, and the specific values of a and b can be determined according to actual needs, generally speaking, the sum of a and b is 1, size represents face size, different face sizes can be represented by different values, position represents face angle, and similarly, different face angles can be represented by different values.

In the above manner, the quality score corresponding to the face snapshot image can be determined by combining a plurality of factors such as the face size and the face angle, so that the accuracy of the obtained quality score is improved.

After the face snapshot image of the face information b and the corresponding quality score are obtained, whether a buffer area corresponding to the identification information corresponding to the face information b is created or not can be further determined, if the buffer area is not created, the buffer area can be created, the face snapshot image and the corresponding quality score can be added into the buffer area, and if the buffer area is created, the face snapshot image and the corresponding quality score can be added into the buffer area according to the size of the quality score.

For any face snapshot, if the number of face snapshot images stored in the corresponding buffer area is equal to M, and the quality score corresponding to the face snapshot is greater than the minimum quality score in the quality scores corresponding to the stored face snapshot, the face snapshot and the corresponding quality score can be added into the buffer area, and the minimum quality score and the corresponding face snapshot can be deleted from the buffer area. If the number of the face snapshot images stored in the corresponding buffer area is less than M, the face snapshot images and the corresponding quality scores can be directly added into the buffer area. In addition, if the number of the stored face snapshot images in the buffer area is equal to M, and the quality score corresponding to the face snapshot image is less than or equal to the minimum quality score in the quality scores corresponding to the stored face snapshot images, the face snapshot image can be discarded.

Based on the above description, fig. 2 is a flowchart of an embodiment of a processing method of the present disclosure for a face snapshot corresponding to face information b. As shown in fig. 2, the following detailed implementation is included.

In step 201, it is determined whether a buffer corresponding to the identification information corresponding to the face snapshot is created, if not, step 202 is executed, and if yes, step 204 is executed.

In step 202, a corresponding buffer is created.

In step 203, the face snapshot and the corresponding quality score are added to the buffer, and then the process is ended.

In step 204, it is determined whether the number of face snapshot images stored in the buffer is less than M, if yes, step 205 is executed, otherwise, step 206 is executed.

M is a positive integer greater than one, and the specific value can be determined according to actual needs.

In step 205, the face snapshot and the corresponding quality score are added to the buffer, and then the process is terminated.

In step 206, it is determined whether the quality score corresponding to the face snapshot is greater than the minimum quality score in the buffer, if so, step 207 is executed, otherwise, step 208 is executed.

In step 207, the face snapshot and the corresponding quality score are added to the buffer, and the quality score with the minimum value and the corresponding face snapshot are deleted from the buffer, and then the process is ended.

In step 208, the face snapshot is discarded, and the process ends.

Through the processing, the face snapshot pictures with higher quality scores are always stored in the buffer area, so that a good foundation is laid for subsequent processing.

For any identification information (for convenience of description, it will be referred to as identification information c hereinafter) obtained from the video frame a, it can be determined whether it meets a predetermined transmission condition, respectively. For example, meeting the predetermined sending condition may include: the continuous occurrence time of the face information corresponding to the identification information c in the video reaches a preset threshold, and in addition, the face snapshot image exists in the buffer area corresponding to the identification information c.

The specific value of the threshold can be determined according to actual needs. For example, if the duration of continuous appearance of face information corresponding to the identification information c in a video reaches 15 seconds, and a face snapshot exists in a buffer corresponding to the identification information c, it can be determined that the identification information c meets a predetermined sending condition.

Correspondingly, if the identification information c is determined to meet the preset sending condition, the target face image in the target library can be retrieved by using each face snapshot image in the buffer area corresponding to the identification information c, and each face snapshot image in the buffer area corresponding to the identification information c can be deleted. For example, the following processing may be respectively performed for each face snapshot in the buffer corresponding to the identification information c: and acquiring the characteristic value of the face snapshot image, respectively acquiring the comparison score between the characteristic value of the face snapshot image and the characteristic value of each target face image in the target library, and taking the target face image with the comparison score larger than a preset threshold value as a retrieval result corresponding to the face snapshot image.

The present disclosure does not limit how to obtain the feature value of the face snapshot, for example, the feature value of the face snapshot may be extracted by using a feature extraction model obtained by pre-training, or the feature value of the face snapshot may be extracted by using a feature extraction algorithm.

Assuming that the buffer area corresponding to the identification information c includes 6 face snapshot images, which are respectively the face snapshot images 1 to 6, the feature value of the face snapshot image 1 can be obtained, the retrieval result corresponding to the face snapshot image 1 can be obtained through feature value comparison, similarly, the feature value of the face snapshot image 2 can be obtained, the retrieval result corresponding to the face snapshot image 2 can be obtained through feature value comparison, further description is omitted, and finally, the retrieval results corresponding to the face snapshot images 1 to 6 can be obtained respectively.

And then, determining a face recognition result of the face corresponding to the identification information c according to each obtained retrieval result.

In addition, if the identification information corresponding to any created buffer area is not included in the identification information obtained by performing face detection and the like on the video frame a, it is determined that the identification information corresponding to the created buffer area is disappeared identification information, further, if a face snapshot exists in the buffer area corresponding to the disappeared identification information, the face snapshot in the buffer area corresponding to the disappeared identification information can be used to search the target face image in the target library, and the face recognition result of the face information corresponding to the disappeared identification information can be determined according to the obtained search result, the disappeared identification information and the corresponding buffer area can be deleted, and if the face snapshot does not exist in the buffer area corresponding to the disappeared identification information, the disappeared identification information and the corresponding buffer area can be deleted directly.

For example, 3 identification information is obtained after the face detection and tracking are performed on the video frame a, 5 buffer areas are created and respectively correspond to different identification information, 2 identification information in the identification information corresponding to the 5 created buffer areas is not included in the 3 identification information obtained after the face detection and tracking are performed on the video frame a, then the 2 identification information can be determined as disappeared identification information, whether a face snapshot is present in the corresponding buffer area can be respectively determined for each disappeared identification information, if a face snapshot is present, the face snapshots can be respectively used for retrieving a target face image in a target library, a face recognition result of the face information corresponding to the disappeared identification information can be determined according to the retrieved result, the disappeared identification information and the corresponding buffer area can be deleted, if the face snapshot image does not exist, the disappeared identification information and the corresponding buffer area can be directly deleted.

If the video frame a is subjected to face detection and tracking to obtain 0 identification information, that is, a face is not detected from the video frame a, and it is assumed that 5 buffer areas have been created, it can be considered that the identification information corresponding to the 5 created buffer areas is the disappeared identification information.

As described above, it is assumed that the buffer area corresponding to the identification information c includes 6 face snapshot images, which are respectively the face snapshot image 1 to the face snapshot image 6, and the retrieval results corresponding to the face snapshot images 1 to 6 are obtained, and the face recognition result of the face corresponding to the identification information c can be determined according to the obtained retrieval results. Specifically, the obtained retrieval results may be summarized (i.e., merged, deduplicated, and the like) to obtain a candidate target face image set, if each target face image in the candidate target face image set corresponds to one face snapshot image, the corresponding target face image with the highest comparison score may be used as a face recognition result, where, for any target face image, the face snapshot image corresponding to the retrieval result where the target face image is located is the face snapshot image corresponding to the target face image, and if the number of face snapshot images corresponding to at least one target face image in the candidate target face image set is greater than 1, the target face image with the largest number of corresponding face snapshot images may be used as the face recognition result.

When a target face image is output as a face recognition result, a face snapshot image is also generally output. If each target face image in the candidate target face image set corresponds to one face snapshot image, the output face snapshot image is the face snapshot image corresponding to the target face image as the face recognition result, if the number of the face snapshot images corresponding to at least one target face image in the candidate target face image set is greater than 1, the face snapshot image with the highest comparison score between the face snapshot image corresponding to the target face image as the face recognition result and the target face image as the face recognition result can be determined, and the determined face snapshot image is used as the output face snapshot image.

In practical application, the candidate target face atlas may be empty, that is, none of the face snapshot figures 1 to 6 is compared with the target face figure in the target library, and in this case, the face snapshot figure with the highest quality score in the face snapshot figures 1 to 6 may be output.

Based on the above description, fig. 3 is a flowchart of an embodiment of the method for determining a face recognition result of face information corresponding to the identification information c according to search results corresponding to the face snapshot fig. 1 to 6. As shown in fig. 3, the following specific implementation steps are included.

In step 301, the search results are summarized to obtain a candidate target face atlas.

In step 302, it is determined whether the candidate target face atlas is empty, if yes, step 303 is performed, otherwise, step 304 is performed.

In step 303, the face snapshot with the highest quality score in the face snapshots fig. 1 to 6 is output, and then the process is ended.

In step 304, it is determined whether each target face image in the candidate target face image set corresponds to a face snapshot image, if yes, step 305 is executed, otherwise, step 306 is executed.

In step 305, the face snapshot with the highest comparison score and the corresponding target face image are output, and then the process is ended.

Each target face image can respectively correspond to a comparison score between the target face image and the corresponding face snapshot image, the comparison score with the highest value can be selected, and the face snapshot image corresponding to the comparison score and the corresponding target face image are output.

In step 306, the target face image with the largest number of corresponding face snapshot images is used as the selected target face image (i.e., as the face recognition result), the face snapshot image with the highest comparison score with the selected target face image in the face snapshot images corresponding to the selected target face image is determined, the determined face snapshot image and the selected target face image are output, and then the process is ended.

Assuming that the candidate target face atlas includes 10 target face images, respectively target face image 1 to target face image 10, wherein target face image 1 and target face image 2 correspond to face snapshot image 1, i.e. target face image 1 and target face image 2 are recalled/retrieved using face snapshot image 1, target face image 3 and target face image 8 correspond to face snapshot image 2, target face image 4 and target face image 9 correspond to face snapshot image 3, target face image 5 and target face image 10 correspond to face snapshot image 4, target face image 6, target face image 8 and target face image 2 correspond to face snapshot image 5, target face image 7 and target face image 2 correspond to face snapshot image 6, wherein target face image 2 corresponds to three face snapshots images, i.e. face snapshot image 1, face snapshot image 5 and face snapshot image 6, target face image 8 corresponds to two face snapshots, that is, the face snapshot image 2 and the face snapshot image 5, and other target face images correspond to one face snapshot image respectively, then the target face image 2 with the largest number of corresponding face snapshot images can be used as the selected target face image, and a face snapshot image with the highest comparison score with the target face image 2 can be selected from the face snapshot image 1, the face snapshot image 5 and the face snapshot image 6 corresponding to the target face image 2, and the face snapshot image 1 is assumed to be the face snapshot image 1, so that the face snapshot image 1 and the target face image 2 can be output.

Through the processing, the final face recognition result can be determined by combining the obtained retrieval results, and the final face recognition result can be determined by combining the number of times that the target face image is recalled and the comparison score between the target face image and the face snapshot image, so that the accuracy and recall rate of face recognition are further improved.

In a word, according to the scheme of the embodiment of the method, the number of the selected optimal face snapshot images can be set through parameter configuration, the target face images can be retrieved according to the selected face snapshot images, and then the final face recognition result can be determined by combining the obtained retrieval results, so that the influence caused by selection errors of the optimal face snapshot images due to various factors is reduced, the method has good applicability in various scenes, and the accuracy and recall rate of face recognition are improved.

It is noted that while for simplicity of explanation, the foregoing method embodiments are described as a series of acts, those skilled in the art will appreciate that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required for the disclosure.

The above is a description of embodiments of the method, and the embodiments of the apparatus are further described below. The following apparatus embodiments are used to perform any of the above methods.

Fig. 4 is a schematic diagram of a structure of a face recognition apparatus 400 according to an embodiment of the present disclosure. As shown in fig. 4, includes: a first processing module 401, a second processing module 402, a third processing module 403 and a fourth processing module 404.

The first processing module 401 is configured to perform face detection on any one of the consecutive video frames to obtain designated face information and corresponding identification information.

And the second processing module 402 is configured to use the subgraph corresponding to the specified face information as a face snapshot and obtain a quality score corresponding to the face snapshot.

And a third processing module 403, configured to add the face snapshot to a buffer corresponding to the identification information according to the quality score, where at most M face snapshot are stored in the buffer, and M is a positive integer greater than 1.

The fourth processing module 404 is configured to retrieve the target face image in the target library by using the face snapshot images of the multiple video frames in the continuous video frames corresponding to the buffer area, and determine a face recognition result corresponding to the specified face information according to the retrieved result.

In the scheme of the embodiment of the device, a plurality of face snapshot images with better quality can be selected to respectively search the target face image, and the final face recognition result can be determined by combining the obtained search results, so that the influence caused by selection errors of the optimal face snapshot image is reduced, and the accuracy and the recall rate of face recognition are improved.

The second processing module 402 may determine the quality score corresponding to the face snapshot according to the face size and the face angle in the face snapshot.

The third processing module 403 may add the face snapshot to the buffer area and delete the stored face snapshot with the smallest value from the buffer area when the number of stored face snapshots in the buffer area is equal to M and the quality score corresponding to the face snapshot is greater than the minimum quality score in the quality scores corresponding to the stored face snapshots.

The third processing module 403 may further perform one or all of the following: if the number of the face snapshot images stored in the buffer area is less than M, adding the face snapshot images into the buffer area; and if the number of the stored face snapshot pictures in the buffer area is equal to M, and the quality score corresponding to the face snapshot picture is less than or equal to the minimum quality score in the quality scores corresponding to the stored face snapshot pictures, discarding the face snapshot pictures.

In addition, the third processing module 403 may also determine whether a buffer area is created before adding the face snapshot to the buffer area corresponding to the identification information, and create the buffer area if the buffer area is not created.

Further, the fourth processing module 404 may retrieve the target face image in the target library by using each face snapshot image in the buffer area when the duration of the continuous occurrence of the designated face information in the video reaches a predetermined threshold.

After retrieving the target face images in the target library by using the face snap shots in the buffer, the fourth processing module 404 may further delete the face snap shots in the buffer.

In addition, the fourth processing module 404 may further determine that, when identification information corresponding to any created buffer area is not included in identification information obtained by performing face detection on a video frame, the identification information corresponding to the created buffer area is disappeared identification information, if a face snapshot exists in the buffer area corresponding to the disappeared identification information, retrieve a target face image in the target library by using each face snapshot in the buffer area corresponding to the disappeared identification information, determine a face recognition result corresponding to the disappeared identification information according to the obtained retrieval result, delete the disappeared identification information and the corresponding buffer area, and if a face snapshot does not exist in the buffer area corresponding to the disappeared identification information, delete the disappeared identification information and the corresponding buffer area.

The fourth processing module 404 may perform the following processing for any face snapshot image: acquiring a characteristic value of the face snapshot image; respectively acquiring comparison scores between the characteristic values of the face snapshot image and the characteristic values of all target face images in a target library; and taking the target face image with the comparison score larger than a preset threshold value as a retrieval result corresponding to the face snapshot image.

Further, the fourth processing module 404 may summarize the obtained retrieval results to obtain a candidate target face image set, and if each target face image in the candidate target face image set corresponds to one face snapshot image, the corresponding target face image with the highest comparison score may be used as a face recognition result, and if the number of face snapshot images corresponding to at least one target face image in the candidate target face image set is greater than 1, the target face image with the largest number of corresponding face snapshot images may be used as a face recognition result.

For a specific work flow of the apparatus embodiment shown in fig. 4, reference is made to the related description in the foregoing method embodiment, and details are not repeated.

In a word, according to the scheme of the embodiment of the disclosure, the number of the selected optimal face snapshot images can be set through parameter configuration, the target face images can be retrieved according to the selected face snapshot images, and then the final face recognition result can be determined by combining the obtained retrieval results, so that the influence caused by the selection error of the optimal face snapshot images due to various factors is reduced, the applicability is better in various scenes, and the accuracy and recall rate of face recognition are improved.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 5 shows a schematic block diagram of an electronic device 500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 501 performs the various methods and processes described above, such as the methods described in this disclosure. For example, in some embodiments, the methods described in this disclosure may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When loaded into RAM 503 and executed by computing unit 501, may perform one or more steps of the methods described in the present disclosure. Alternatively, in other embodiments, the computing unit 501 may be configured by any other suitable means (e.g., by means of firmware) to perform the methods described by the present disclosure.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS). The server may also be a server of a distributed system, or a server incorporating a blockchain. Cloud computing refers to accessing an elastically extensible shared physical or virtual resource pool through a network, resources can include servers, operating systems, networks, software, applications, storage devices and the like, a technical system for deploying and managing the resources in a self-service mode as required can be achieved, and efficient and powerful data processing capacity can be provided for technical applications and model training of artificial intelligence, block chains and the like through a cloud computing technology.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

It should be noted that the scheme disclosed by the disclosure can be applied to the field of artificial intelligence, in particular to the fields of computer vision, deep learning and the like.

Artificial intelligence is a subject for studying a computer to simulate some thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning and the like) of a human, and has a hardware technology and a software technology, the artificial intelligence hardware technology generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, big data processing and the like, and the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge graph technology and the like.

Claims

1. A face recognition method, comprising:

2. The method of claim 1, wherein the obtaining the quality score corresponding to the face snapshot comprises:

and determining the quality score corresponding to the face snapshot image according to the face size and the face angle in the face snapshot image.

3. The method of claim 1, wherein the adding the face snapshot into the buffer corresponding to the identification information according to the quality score comprises:

and if the number of the stored face snapshot pictures in the buffer area is equal to M and the quality score corresponding to the face snapshot picture is larger than the minimum quality score in the quality scores corresponding to the stored face snapshot pictures, adding the face snapshot picture into the buffer area, and deleting the stored face snapshot picture with the minimum value from the buffer area.

4. The method of claim 1 or 3, further comprising one or all of:

if the number of the face snapshot images stored in the buffer area is less than M, adding the face snapshot images into the buffer area;

and if the number of the stored face snapshot pictures in the buffer area is equal to M, and the quality score corresponding to the face snapshot picture is less than or equal to the minimum quality score in the quality scores corresponding to the stored face snapshot pictures, discarding the face snapshot pictures.

5. The method of claim 1, further comprising:

before the face snapshot is added into the buffer area corresponding to the identification information, whether the buffer area is established or not is determined;

and if the buffer area is not created, creating the buffer area.

6. The method of claim 1, wherein the retrieving the target face image in the target library using the face snap shots corresponding to the plurality of video frames in the continuous video frames in the buffer respectively comprises:

and if the continuous occurrence time of the specified face information in the video reaches a preset threshold value, retrieving the target face image in the target library by using each face snapshot image in the buffer area respectively.

7. The method of claim 6, further comprising:

and after the target face images in the target library are retrieved by respectively utilizing the face snap-shots in the buffer area, deleting the face snap-shots in the buffer area.

8. The method of claim 1 or 5, further comprising:

if the identification information corresponding to any created buffer area is not included in the identification information obtained by performing face detection on the video frame, determining that the identification information corresponding to any created buffer area is disappeared identification information;

if the buffer area corresponding to the disappeared identification information has the face snapshot image, retrieving the target face image in the target library by using each face snapshot image in the buffer area corresponding to the disappeared identification information, determining a face recognition result corresponding to the disappeared identification information according to the obtained retrieval result, and deleting the disappeared identification information and the corresponding buffer area;

and if the face snapshot image does not exist in the buffer area corresponding to the disappeared identification information, deleting the disappeared identification information and the corresponding buffer area.

9. The method according to any one of claims 1 to 8, wherein the retrieving of the target face graph in the target library comprises:

aiming at any face snapshot image, the following processing is respectively executed:

acquiring a characteristic value of the face snapshot image;

respectively acquiring comparison scores between the characteristic values of the face snapshot images and the characteristic values of all target face images in the target library;

and taking the target face image with the comparison score larger than a preset threshold value as a retrieval result corresponding to the face snapshot image.

10. The method of claim 9, wherein the determining the face recognition result corresponding to the designated face information according to the obtained retrieval result comprises:

summarizing all the obtained retrieval results to obtain a candidate target face atlas;

if each target face image in the candidate target face image set corresponds to one face snapshot image, taking the corresponding target face image with the highest comparison score as the face recognition result;

and if the number of face snapshot images corresponding to at least one target face image in the candidate target face image set is greater than 1, taking the target face image with the largest number of corresponding face snapshot images as the face recognition result.

11. A face recognition apparatus comprising: the system comprises a first processing module, a second processing module, a third processing module and a fourth processing module;

12. The apparatus of claim 11, wherein,

the second processing module is further used for determining the quality score corresponding to the face snapshot image according to the face size and the face angle in the face snapshot image.

13. The apparatus of claim 11, wherein,

the third processing module is further configured to add the face snapshot to the buffer area and delete the stored face snapshot with the smallest value from the buffer area when the number of stored face snapshots in the buffer area is equal to M and the quality score corresponding to the face snapshot is greater than the quality score with the smallest value among the quality scores corresponding to the stored face snapshots.

14. The apparatus of claim 11 or 13,

the third processing module is further configured to perform one or all of:

15. The apparatus of claim 11, wherein,

the third processing module is further configured to determine whether the buffer area is created before adding the face snapshot to the buffer area corresponding to the identification information, and create the buffer area if the buffer area is not created.

16. The apparatus of claim 11, wherein,

and the fourth processing module is further configured to retrieve the target face image in the target library by using each face snapshot image in the buffer area when the duration of continuous occurrence of the specified face information in the video reaches a predetermined threshold.

17. The apparatus of claim 16, wherein,

the fourth processing module is further configured to delete each face snapshot in the buffer after retrieving the target face image in the target library by using each face snapshot in the buffer, respectively.

18. The apparatus of claim 11 or 15,

the fourth processing module is further configured to determine that the identification information corresponding to any created buffer area is disappeared identification information if the identification information corresponding to any created buffer area is not included in identification information obtained by performing face detection on the video frame, retrieve a target face image in the target library by using each face snapshot image in the buffer area corresponding to the disappeared identification information if a face snapshot image exists in the buffer area corresponding to the disappeared identification information, determine a face recognition result corresponding to the disappeared identification information according to an obtained retrieval result, delete the disappeared identification information and a corresponding buffer area, and delete the disappeared identification information and a corresponding buffer area if a face snapshot image does not exist in the buffer area corresponding to the disappeared identification information.

19. The apparatus of any one of claims 11-18,

the fourth processing module is further configured to, for any face snapshot, respectively perform the following processing: acquiring a characteristic value of the face snapshot image; respectively acquiring comparison scores between the characteristic values of the face snapshot images and the characteristic values of all target face images in the target library; and taking the target face image with the comparison score larger than a preset threshold value as a retrieval result corresponding to the face snapshot image.

20. The apparatus of claim 19, wherein,

the fourth processing module is further configured to summarize the obtained retrieval results to obtain a candidate target face image set, and if each target face image in the candidate target face image set corresponds to one face snapshot image, take the corresponding target face image with the highest comparison score as the face recognition result, and if the number of face snapshot images corresponding to at least one target face image in the candidate target face image set is greater than 1, take the target face image with the largest number of corresponding face snapshot images as the face recognition result.

21. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-10.

23. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-10.