CN113283305B

CN113283305B - Face recognition method, device, electronic equipment and computer readable storage medium

Info

Publication number: CN113283305B
Application number: CN202110473605.2A
Authority: CN
Inventors: 高治力; 何建斌
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2024-03-26
Anticipated expiration: 2041-04-29
Also published as: CN113283305A

Abstract

The present disclosure provides face recognition methods, apparatus, electronic devices, and the like, relating to the field of artificial intelligence such as computer vision and deep learning, the method may include: face detection is carried out on any video frame in the continuous video frames to obtain appointed face information and corresponding identification information; taking the subgraph corresponding to the appointed face information as a face snapshot, and obtaining a quality score corresponding to the face snapshot; adding the face snap shots into a buffer area corresponding to the identification information according to the quality score, wherein M face snap shots are stored in the buffer area at most, and M is a positive integer greater than 1; and searching the target face images in the target library by using the face snap shots corresponding to a plurality of video frames in the continuous video frames in the buffer region respectively, and determining the face recognition result corresponding to the appointed face information according to the obtained search result. By applying the scheme disclosed by the disclosure, the accuracy and recall rate of face recognition can be improved.

Description

Face recognition method, device, electronic equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to a face recognition method, apparatus, electronic device, and computer readable storage medium in the fields of computer vision and deep learning.

Background

When face recognition is carried out, a face snapshot with optimal quality is generally selected according to a certain strategy, and the characteristic value of the face snapshot with optimal quality is compared with the characteristic value of each target face graph in a target library, so that a face recognition result is obtained.

However, the strategy is generally only suitable for specific scenes, and once the scenes are changed, the face snap shots with optimal quality selected according to the strategy often have larger difference from the actual optimal face snap shots, so that the accuracy rate, recall rate and the like of face recognition are affected.

Disclosure of Invention

The present disclosure provides face recognition methods, apparatus, electronic devices, and computer-readable storage media.

According to one aspect of the present disclosure, there is provided a face recognition method including:

face detection is carried out on any video frame in the continuous video frames to obtain appointed face information and corresponding identification information;

taking the subgraph corresponding to the appointed face information as a face snapshot, and obtaining a quality score corresponding to the face snapshot;

adding the face snap shots into a buffer area corresponding to the identification information according to the quality score, wherein M face snap shots are stored in the buffer area at most, and M is a positive integer greater than 1;

And searching the target face images in the target library by using the face snap shots corresponding to a plurality of video frames in the continuous video frames in the buffer region respectively, and determining the face recognition result corresponding to the appointed face information according to the obtained search result.

According to one aspect of the present disclosure, there is provided a face recognition apparatus including: the device comprises a first processing module, a second processing module, a third processing module and a fourth processing module;

the first processing module is used for carrying out face detection on any video frame in the continuous video frames to obtain appointed face information and corresponding identification information;

the second processing module is used for taking the subgraph corresponding to the appointed face information as a face snapshot and obtaining a quality score corresponding to the face snapshot;

the third processing module is configured to add the face snapshot to a buffer area corresponding to the identification information according to the quality score, where at most M face snapshots are stored in the buffer area, and M is a positive integer greater than 1;

and the fourth processing module is used for searching the target face images in the target library by using the face snap shots corresponding to the plurality of video frames in the continuous video frames in the buffer zone respectively, and determining the face recognition result corresponding to the appointed face information according to the obtained search result.

According to one aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to one aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method as described above.

According to one aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

One embodiment of the above disclosure has the following advantages or benefits: the method has the advantages that multiple face snap shots with better quality can be selected to search the target face images respectively, and the face recognition results can be determined by combining the obtained search results, so that the influence caused by the error selection of the optimal face snap shots is reduced, and the accuracy and recall rate of face recognition are improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of an embodiment of a face recognition method of the present disclosure;

fig. 2 is a flowchart of an embodiment of a method for processing a face snapshot corresponding to face information b in the present disclosure;

fig. 3 is a flowchart of an embodiment of a method for determining a face recognition result of face information corresponding to the identification information c according to the search results respectively corresponding to the face shots of fig. 1 to 6;

fig. 4 is a schematic diagram of a composition structure of an embodiment 400 of a face recognition device of the present disclosure;

fig. 5 shows a schematic block diagram of an electronic device 500 that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In addition, it should be understood that the term "and/or" herein is merely one association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

Fig. 1 is a flowchart of an embodiment of a face recognition method of the present disclosure. As shown in fig. 1, the following specific implementation steps are included.

In step 101, face detection is performed on any video frame in the continuous video frames to obtain specified face information and corresponding identification information therein.

In step 102, the subgraph corresponding to the specified face information is used as a face snapshot, and the quality score corresponding to the face snapshot is obtained.

In step 103, adding the face snap shots into a buffer area corresponding to the identification information according to the quality score, wherein M face snap shots are stored in the buffer area at most, and M is a positive integer greater than 1.

In step 104, the face snap shots corresponding to a plurality of video frames in the continuous video frames in the buffer are used for searching the target face images in the target library, and the face recognition result corresponding to the appointed face information is determined according to the obtained searching result.

According to the scheme, the method and the device can be used for searching the target face images by selecting the face snap shots with better quality, and determining the final face recognition result by combining the obtained search results, so that the influence caused by the error selection of the optimal face snap shots is reduced, and the accuracy and recall rate of face recognition are improved.

Alternatively, the target library may be pre-created, such as may be created during a business application deployment phase, where each target face graph and its corresponding feature value may be included. The feature values of the target face map are not limited, and for example, feature extraction models obtained by training in advance may be used to extract the feature values of the target face map, or feature extraction algorithms may be used to extract the feature values of the target face map.

The scheme disclosed by the disclosure is applicable to video streams received in real time and offline video files, and has wide applicability.

Each video frame in a video stream received in real time or in an offline video file may be processed separately in the manner shown in fig. 1. For convenience of description, a currently processed video frame will be hereinafter referred to as a video frame a.

The video frame a may also be preprocessed before performing face detection or the like on the video frame a. The preprocessing specifically includes what content can be needed according to the actual needs, for example, the video frame a can be sent to a decoder to be decoded, in addition, the current deep learning framework generally needs frame data in an ARGB or BGRA format, where a represents transparency (Alpha), R represents Red (Red), G represents Green (Green), B represents Blue (Blue), and the decoded video frame in a YUV format, Y represents brightness (luminence), U and V represent chromaticity (chromance), so that the video frame a can be further subjected to color space conversion to be converted into an ARGB or BGRA format.

After the preprocessing is completed, face detection and tracking can be performed on the video frame a, so that face information and corresponding identification information (id) in the video frame a are obtained. The face information can be obtained through detection, and the identification information can be obtained through tracking. For example, face information of a person continuously appears in a plurality of video frames, and the face information obtained from the plurality of video frames corresponds to the same identification information.

Assuming that three faces are detected from the video frame a, the processing can be performed in the same manner for each face information, respectively.

Specifically, for any face information (for convenience of description, the face information is called as face information b in the following), a sub image corresponding to the face information b may be used as a face snapshot, for example, a face sub image only including the face information b may be intercepted from a video frame a (i.e., a video image) as a required face snapshot, then a quality score corresponding to the face snapshot may be obtained, and the obtained face snapshot may be added to a buffer area corresponding to identification information corresponding to the face information b according to the size of the quality score, where at most M face snapshots with the highest quality score are stored in the buffer area, and M is a positive integer greater than one, and the specific value may be determined according to actual needs. Optionally, the quality scores corresponding to the face snap shots may also be stored in the buffer, which will be described below as an example.

The quality score corresponding to the face snapshot can be determined according to the face size and the face angle in the face snapshot.

For example, the face size and face angle may be multiplied by corresponding weights, respectively, and the two products added together to form the resulting quality score.

The method comprises the following steps: quality = size a + post b; (1)

Wherein quality represents quality score, a and b represent weights respectively, 0= < a < = 1,0< = b < = 1, specific values of a and b can be determined according to practical needs, generally, the sum of a and b is 1, size represents face size, different face sizes can be represented by different values, phase represents face angle, and likewise, different face angles can be represented by different values.

In the mode, the quality scores corresponding to the face snap shots can be determined by combining a plurality of factors such as the face size, the face angle and the like, so that the accuracy of the obtained quality scores is improved.

After the face snapshot of the face information b and the corresponding quality score are obtained, whether a buffer area corresponding to the identification information corresponding to the face information b is created or not can be further determined, if the buffer area is not created, the buffer area can be created, the face snapshot and the corresponding quality score can be added into the buffer area, and if the buffer area is created, the face snapshot and the corresponding quality score can be added into the buffer area according to the size of the quality score.

For any face snapshot, if the number of stored face snapshots in the corresponding buffer area is equal to M, and the quality score corresponding to the face snapshot is greater than the quality score with the minimum value in the quality scores corresponding to the stored face snapshots, the face snapshot and the corresponding quality score can be added into the buffer area, and the quality score with the minimum value and the corresponding face snapshot can be deleted from the buffer area. If the number of the stored face snapshots in the corresponding buffer area is smaller than M, the face snapshots and the corresponding quality scores can be directly added into the buffer area. In addition, if the number of the stored face snapshots in the buffer area is equal to M, and the quality score corresponding to the face snapshots is smaller than or equal to the quality score with the smallest value in the quality scores corresponding to the stored face snapshots, the face snapshots can be discarded.

Based on the above description, fig. 2 is a flowchart of an embodiment of a method for processing a face snapshot corresponding to the face information b in the present disclosure. As shown in fig. 2, the following detailed implementation is included.

In step 201, it is determined whether a buffer area corresponding to the identification information corresponding to the face snapshot is created, if not, step 202 is executed, and if yes, step 204 is executed.

In step 202, a corresponding buffer is created.

In step 203, the face snapshot and the corresponding quality score are added to the buffer, and then the process is ended.

In step 204, it is determined whether the number of the face snapshots stored in the buffer is less than M, if yes, step 205 is performed, otherwise, step 206 is performed.

M is a positive integer greater than one, and the specific value can be determined according to actual needs.

In step 205, the face snapshot and the corresponding quality score are added to the buffer, and then the process is ended.

In step 206, it is determined whether the quality score corresponding to the face snapshot is greater than the quality score with the smallest value in the buffer, if yes, step 207 is executed, otherwise step 208 is executed.

In step 207, the face snapshot and the corresponding quality score are added to the buffer, and the quality score with the smallest value and the corresponding face snapshot are deleted from the buffer, and then the process is ended.

In step 208, the face snapshot is discarded, after which the process ends.

Through the processing, the face snap shots with higher quality scores are always stored in the buffer area, so that a good foundation is laid for subsequent processing and the like.

For any of the identification information (hereinafter referred to as identification information c for convenience of description) obtained from the video frame a, it can be determined whether or not it meets a predetermined transmission condition, respectively. For example, meeting the predetermined transmission condition may include: the time length of the face information corresponding to the identification information c continuously appearing in the video reaches a preset threshold value, and in addition, the method can further comprise the step of existence of a face snapshot in a buffer area corresponding to the identification information c.

The specific value of the threshold value can be determined according to actual needs. For example, the duration of the continuous occurrence of the face information corresponding to the identification information c in the video reaches 15 seconds, and if a face snapshot exists in the buffer area corresponding to the identification information c, the identification information c can be considered to meet the predetermined transmission condition.

Correspondingly, if the identification information c is determined to meet the preset sending condition, the face snap shots in the buffer area corresponding to the identification information c can be used for searching the target face shots in the target library, and the face snap shots in the buffer area corresponding to the identification information c can be deleted. For example, for each face snapshot in the buffer area corresponding to the identification information c, the following processes may be performed respectively: and acquiring the characteristic value of the face snapshot, respectively acquiring the comparison scores between the characteristic value of the face snapshot and the characteristic value of each target face image in the target library, and taking the target face image with the comparison score larger than a preset threshold value as a retrieval result corresponding to the face snapshot.

The present disclosure is not limited to how to obtain the feature value of the face snapshot, for example, a feature extraction model obtained by training in advance may be used to extract the feature value of the face snapshot, or a feature extraction algorithm may be used to extract the feature value of the face snapshot.

Assuming that the buffer area corresponding to the identification information c includes 6 face snap shots, namely the face snap shots 1-6, then the characteristic values of the face snap shots 1 can be obtained, the retrieval results corresponding to the face snap shots 1 can be obtained through characteristic value comparison, likewise, the characteristic values of the face snap shots 2 can be obtained, the retrieval results corresponding to the face snap shots 2 can be obtained through characteristic value comparison, and other details are omitted, and finally the retrieval results corresponding to the face snap shots 1-6 can be obtained respectively.

And then, determining the face recognition result of the face corresponding to the identification information c according to each obtained search result.

In addition, if the identification information corresponding to any created buffer area is not included in the identification information obtained by performing face detection on the video frame a, the identification information corresponding to the created buffer area can be determined to be disappeared identification information, further, if a face snapshot exists in the buffer area corresponding to the disappeared identification information, the target face images in the target library can be searched by utilizing the face snapshots in the buffer area corresponding to the disappeared identification information, the face recognition result of the face information corresponding to the disappeared identification information can be determined according to the obtained search result, the disappeared identification information and the corresponding buffer area can be deleted, and if the face snapshot does not exist in the buffer area corresponding to the disappeared identification information, the disappeared identification information and the corresponding buffer area can be deleted directly.

For example, after face detection and tracking are performed on the video frame a, 3 pieces of identification information are obtained, 5 buffers are created and respectively correspond to one piece of different identification information, 2 pieces of identification information in the identification information corresponding to the 5 created buffers are not included in the 3 pieces of identification information obtained after face detection and tracking are performed on the video frame a, then the 2 pieces of identification information can be determined to be disappeared identification information, for each disappeared identification information, whether a face snapshot exists in the corresponding buffer can be determined respectively, if a face snapshot exists, each face snapshot in the corresponding buffers can be used for searching a target face map in the target library, the face recognition result of the face information corresponding to the disappeared identification information can be determined according to the obtained search result, the disappeared identification information and the corresponding buffer can be deleted, and if no face snapshot exists, the disappeared identification information and the corresponding buffer can be deleted directly.

If 0 pieces of identification information are obtained after the face detection and tracking are performed on the video frame a, that is, no face is detected from the video frame a, and it is assumed that 5 buffers have been created, then all pieces of identification information corresponding to the 5 created buffers can be considered as disappeared identification information.

As described above, assuming that the buffer area corresponding to the identification information c includes 6 face shots, the 6 face shots are respectively the face shots 1 to 6, and the search results corresponding to the face shots 1 to 6 are obtained respectively, and the face recognition result of the face corresponding to the identification information c can be determined according to the obtained search results. Specifically, the obtained search results can be summarized (i.e. combined, de-duplicated, etc.), a candidate target face image set is obtained, if each target face image in the candidate target face image set corresponds to one face snapshot image respectively, the target face image with the highest corresponding comparison score can be used as the face recognition result, wherein for any target face image, the face snapshot image corresponding to the search result where the target face image is located is the face snapshot image corresponding to the target face image, and if the number of the face snapshots corresponding to at least one target face image in the candidate target face image set is greater than 1, the target face image with the largest number of the corresponding face snapshots can be used as the face recognition result.

When outputting a target face map as a face recognition result, a face snapshot is also typically output. If the number of the face snap shots corresponding to at least one target face image in the candidate target face image set is greater than 1, the face snap shots with the highest comparison score between the face snap shots corresponding to the target face image as the face recognition result and the target face image as the face recognition result can be determined, and the determined face snap shots are taken as the output face snap shots.

In practical application, the candidate target face atlas may be empty, that is, the face snap charts 1-6 are not compared with the target face charts in the target library, and in this case, the face snap chart with the highest quality score in the face snap charts 1-6 can be output.

Based on the above description, fig. 3 is a flowchart of an embodiment of a method for determining a face recognition result of face information corresponding to the identification information c according to the search results respectively corresponding to the face snap fig. 1 to 6. As shown in fig. 3, the following specific implementation steps are included.

In step 301, the search results are summarized to obtain a candidate target face atlas.

In step 302, it is determined whether the candidate target face atlas is empty, if so, step 303 is executed, otherwise, step 304 is executed.

In step 303, the face snap shots with the highest quality scores in the face snap shots 1-6 are output, and then the process is ended.

In step 304, it is determined whether each target face image in the candidate target face image set corresponds to one face snapshot, if so, step 305 is performed, otherwise, step 306 is performed.

In step 305, the face snapshot with the highest comparison score and the corresponding target face graph are output, and then the process is ended.

The target face images can be respectively corresponding to the comparison scores between one corresponding face snapshot image and the corresponding face snapshot image, the comparison score with the highest value can be selected, and the face snapshot image corresponding to the comparison score and the corresponding target face image are output.

In step 306, the target face map with the largest number of corresponding face snapshots is used as the selected target face map (i.e. as the face recognition result), the face snapshot with the highest score compared with the selected target face map in the face snapshots corresponding to the selected target face map is determined, the determined face snapshots and the selected target face map are output, and then the flow is ended.

Assume that a candidate target face set comprises 10 target face images, namely target face image 1-target face image 10, wherein the target face image 1 and the target face image 2 correspond to the face snapshot image 1, namely three face snapshots are corresponding to the target face image 1 and the target face image 2, namely the target face image 3 and the target face image 8 correspond to the face snapshot image 2, the target face image 4 and the target face image 9 correspond to the face snapshot image 3, the target face image 5 and the target face image 10 correspond to the face snapshot image 4, the target face image 6, the target face image 8 and the target face image 2 correspond to the face snapshot image 5, the target face image 7 and the target face image 2 correspond to the face snapshot image 6, the three face snapshots are corresponding to the target face image 2, namely the face snapshot image 1, the face snapshot image 5 and the face snapshot image 6, the target face image 8 correspond to the two face snapshots, namely the face snapshot image 5 and the face snapshot image 2 can be selected from the target face images 1 and the target face image 2, the number of the target face images 2 is larger than that the number of the target face images 1 and the target face image 2 is equal to the number of the target face images 2, and the target face images 2 can be selected from the target face images 2.

Through the processing, the final face recognition result can be determined by combining the obtained search results, and the final face recognition result can be determined by combining the recall times of the target face image, the comparison score between the target face image and the face snapshot image and the like, so that the accuracy and recall rate of face recognition are further improved.

In a word, by adopting the scheme of the embodiment of the method, the number of the selected optimal face snap shots can be set through parameter configuration, and the retrieval of the target face shots can be respectively carried out according to the selected face snap shots, so that the final face recognition result can be determined by combining the obtained retrieval results, the influence caused by the error in selecting the optimal face snap shots due to various factors is reduced, the method has better applicability in various scenes, and the accuracy and recall rate of face recognition are improved.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present disclosure is not limited by the order of actions described, as some steps may take place in other order or simultaneously in accordance with the present disclosure. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all of the preferred embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.

The foregoing is a description of embodiments of the method, and the following further describes embodiments of the present disclosure through examples of apparatus. The following apparatus embodiments are used to perform any of the methods described above.

Fig. 4 is a schematic diagram of a composition structure of an embodiment 400 of a face recognition device of the present disclosure. As shown in fig. 4, includes: a first processing module 401, a second processing module 402, a third processing module 403, and a fourth processing module 404.

The first processing module 401 is configured to perform face detection on any video frame in the continuous video frames, so as to obtain specified face information and corresponding identification information therein.

The second processing module 402 is configured to use the subgraph corresponding to the specified face information as a face snapshot, and obtain a quality score corresponding to the face snapshot.

And a third processing module 403, configured to add the face snapshot to a buffer area corresponding to the identification information according to the quality score, where the buffer area stores at most M face snapshots, where M is a positive integer greater than 1.

And a fourth processing module 404, configured to search the target face map in the target library by using the face snap shots corresponding to the plurality of video frames in the continuous video frames in the buffer, and determine a face recognition result corresponding to the specified face information according to the obtained search result.

In the scheme of the embodiment of the device, a plurality of face snap shots with better quality can be selected to search the target face images respectively, and the final face recognition result can be determined by combining the obtained search results, so that the influence caused by the error selection of the optimal face snap shots is reduced, and the accuracy and recall rate of face recognition are improved.

The second processing module 402 may determine a quality score corresponding to the face snapshot according to the face size and the face angle in the face snapshot.

The third processing module 403 may add the face snapshot to the buffer when the number of stored face snapshots in the buffer is equal to M and the quality score corresponding to the face snapshot is greater than the quality score with the smallest value in the quality scores corresponding to the stored face snapshots, and delete the stored face snapshot with the smallest value from the buffer.

The third processing module 403 may further perform one or all of the following: if the number of the stored face snap shots in the buffer area is smaller than M, adding the face snap shots into the buffer area; and if the number of the stored face snapshots in the buffer area is equal to M, and the quality scores corresponding to the face snapshots are smaller than or equal to the quality score with the smallest value in the quality scores corresponding to the stored face snapshots, discarding the face snapshots.

In addition, the third processing module 403 may further determine whether a buffer is created before adding the face snapshot to the buffer corresponding to the identification information, and if the buffer is not created, create the buffer.

Further, the fourth processing module 404 may search the target face map in the target library by using each face snapshot map in the buffer when the duration of the continuous occurrence of the specified face information in the video reaches a predetermined threshold.

After retrieving the target face images in the target library by using the face snapshots in the buffer, the fourth processing module 404 may further delete the face snapshots in the buffer.

In addition, the fourth processing module 404 may further determine that the identification information corresponding to the created buffer area is vanished identification information when the identification information corresponding to any one of the created buffer areas is not included in the identification information obtained by face detection on the video frame, if a face snapshot exists in the buffer area corresponding to the vanished identification information, search the target face map in the target library by using each face snapshot in the buffer area corresponding to the vanished identification information, determine the face recognition result corresponding to the vanished identification information according to the obtained search result, delete the vanished identification information and the corresponding buffer area, and delete the vanished identification information and the corresponding buffer area if the face snapshot does not exist in the buffer area corresponding to the vanished identification information.

The fourth processing module 404 may perform the following processing for any face snapshot respectively: acquiring a characteristic value of the face snapshot; respectively obtaining the comparison scores between the characteristic values of the face snap images and the characteristic values of all the target face images in the target library; and taking the target face image with the comparison score larger than the preset threshold value as a retrieval result corresponding to the face snapshot image.

Further, the fourth processing module 404 may aggregate the obtained search results to obtain a candidate target face image set, if each target face image in the candidate target face image set corresponds to one face snapshot image, the target face image with the highest corresponding comparison score may be used as the face recognition result, and if the number of face snapshots corresponding to at least one target face image in the candidate target face image set is greater than 1, the target face image with the largest number of corresponding face snapshots may be used as the face recognition result.

The specific workflow of the embodiment of the apparatus shown in fig. 4 is referred to the related description in the foregoing method embodiment, and will not be repeated.

In a word, by adopting the scheme of the embodiment of the disclosure, the number of the selected optimal face snapshots can be set through parameter configuration, and the retrieval of the target face snapshots can be respectively carried out according to the selected face snapshots, so that the final face recognition result can be determined by combining the obtained retrieval results, thereby reducing the influence caused by the error in selecting the optimal face snapshots due to various factors, having better applicability in various scenes, and improving the accuracy and recall rate of face recognition.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 5 shows a schematic block diagram of an electronic device 500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile apparatuses, such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the apparatus 500 includes a computing unit 501 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The computing unit 501, ROM 502, and RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Various components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods and processes described above, such as the methods described in this disclosure. For example, in some embodiments, the methods described in the present disclosure may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the methods described in the present disclosure may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the methods described in the present disclosure by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical hosts and Virtual Private Servers (VPSs). The server may also be a server of a distributed system or a server that incorporates a blockchain. Cloud computing refers to a technology system which is used for accessing an elastically extensible shared physical or virtual resource pool through a network, resources can comprise a server, an operating system, a network, software, application, storage equipment and the like, and can be deployed and managed in an on-demand and self-service mode, and by means of cloud computing technology, high-efficiency and powerful data processing capacity can be provided for technical application and model training of artificial intelligence, blockchain and the like.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

It should be noted that the solution disclosed in the present disclosure may be applied in the field of artificial intelligence, and in particular, in the fields of computer vision, deep learning, and the like.

Artificial intelligence is the subject of studying certain thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.) that make a computer simulate a person, and has technology at both hardware and software levels, and artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, etc., and artificial intelligence software technologies mainly include computer vision technologies, speech recognition technologies, natural language processing technologies, machine learning/deep learning, big data processing technologies, knowledge graph technologies, etc.

Claims

1. A face recognition method, comprising:

adding the face snap shots into a buffer area corresponding to the identification information according to the quality scores, wherein M face snap shots with the highest quality scores are stored in the buffer area at most, and M is a positive integer greater than 1;

if the duration of the appointed face information continuously appearing in the video reaches a preset threshold value, searching the target face images in a target library by using the face snap shots corresponding to a plurality of video frames in the continuous video frames in the buffer zone, determining the face recognition result corresponding to the appointed face information by combining each searching result, and outputting the determined target face images serving as the face recognition result and the corresponding face snap shots;

further comprises: if the identification information corresponding to any created buffer area is not included in the identification information obtained by face detection on the video frame, determining the identification information corresponding to any created buffer area as disappeared identification information, if a face snap map exists in the buffer area corresponding to the disappeared identification information, searching the target face map in the target library by utilizing each face snap map in the buffer area corresponding to the disappeared identification information, determining the face recognition result corresponding to the disappeared identification information according to the obtained search result, and deleting the disappeared identification information and the corresponding buffer area.

2. The method of claim 1, wherein the obtaining a quality score corresponding to the face snapshot comprises:

and determining a quality score corresponding to the face snapshot according to the face size and the face angle in the face snapshot.

3. The method of claim 1, wherein adding the face snapshot to the buffer corresponding to the identification information according to the quality score comprises:

and if the number of the stored face snap shots in the buffer area is equal to M, and the quality score corresponding to the face snap shots is larger than the quality score with the minimum value in the quality scores corresponding to the stored face snap shots, adding the face snap shots into the buffer area, and deleting the stored face snap shots with the minimum value from the buffer area.

4. The method of claim 1, further comprising one or all of:

if the number of the stored face snapshots in the buffer area is smaller than M, adding the face snapshots into the buffer area;

and if the number of the stored face snapshots in the buffer area is equal to M, and the quality score corresponding to the face snapshots is smaller than or equal to the quality score with the minimum value in the quality scores corresponding to the stored face snapshots, discarding the face snapshots.

5. The method of claim 1, further comprising:

before the face snapshot is added into the buffer area corresponding to the identification information, determining whether the buffer area is created;

if the buffer is not created, the buffer is created.

6. The method of claim 1, further comprising:

and after searching the target face images in the target library by utilizing the face snap images in the buffer area respectively, deleting the face snap images in the buffer area.

7. The method of claim 1, further comprising:

and if the face snap shots do not exist in the buffer area corresponding to the disappeared identification information, deleting the disappeared identification information and the corresponding buffer area.

8. The method of any of claims 1-7, wherein the retrieving a target face map in a target library comprises:

for any face snapshot, the following processes are respectively executed:

acquiring a characteristic value of the face snapshot;

respectively obtaining the comparison scores between the characteristic values of the face snap images and the characteristic values of the target face images in the target library;

and taking the target face image with the comparison score larger than a preset threshold value as a retrieval result corresponding to the face snapshot image.

9. The method of claim 8, wherein the determining, in conjunction with each search result, a face recognition result corresponding to the specified face information includes:

summarizing the obtained search results to obtain a candidate target face atlas;

if each target face image in the candidate target face image set corresponds to one face snapshot image respectively, taking the corresponding target face image with the highest comparison score as the face recognition result;

and if the number of the face snap shots corresponding to at least one target face image in the candidate target face image set is larger than 1, taking the target face image with the largest number of the corresponding face snap shots as the face recognition result.

10. A face recognition device, comprising: the device comprises a first processing module, a second processing module, a third processing module and a fourth processing module;

The third processing module is configured to add the face snapshot to a buffer area corresponding to the identification information according to the quality score, where at most M face snapshots with highest quality scores are stored in the buffer area, and M is a positive integer greater than 1;

the fourth processing module is configured to search, when it is determined that a duration of continuous occurrence of the specified face information in the video reaches a predetermined threshold, target face images in a target library by using face snap shots corresponding to a plurality of video frames in the continuous video frames in the buffer zone, determine face recognition results corresponding to the specified face information by combining with each search result, and output the determined target face images serving as the face recognition results and the corresponding face snap shots;

the fourth processing module is further configured to determine that the identification information corresponding to any created buffer area is vanished identification information if the identification information corresponding to any created buffer area is not included in the identification information obtained by performing face detection on the video frame, and search the target face image in the target library by using each face snapshot in the buffer area corresponding to the vanished identification information if the face snapshot exists in the buffer area corresponding to the vanished identification information, determine a face recognition result corresponding to the vanished identification information according to the obtained search result, and delete the vanished identification information and the corresponding buffer area.

11. The apparatus of claim 10, wherein,

the second processing module is further used for determining a quality score corresponding to the face snapshot according to the face size and the face angle in the face snapshot.

12. The apparatus of claim 10, wherein,

the third processing module is further configured to add the face snapshot to the buffer area when the number of stored face snapshots in the buffer area is equal to M, and a quality score corresponding to the face snapshot is greater than a quality score with the smallest value in the quality scores corresponding to the stored face snapshots, and delete the stored face snapshot with the smallest value from the buffer area.

13. The apparatus of claim 10, wherein,

the third processing module is further configured to perform one or all of:

14. The apparatus of claim 10, wherein,

the third processing module is further configured to determine, before adding the face snapshot to a buffer area corresponding to the identification information, whether the buffer area is created, and if the buffer area is not created, create the buffer area.

15. The apparatus of claim 10, wherein,

the fourth processing module is further configured to delete each face snapshot in the buffer after searching the target face graphs in the target library by using each face snapshot in the buffer.

16. The apparatus of claim 10, wherein,

the fourth processing module is further configured to delete the disappeared identification information and the corresponding buffer area if no face snapshot exists in the buffer area corresponding to the disappeared identification information.

17. The device according to any one of claims 10 to 16, wherein,

the fourth processing module is further configured to perform, for any face snapshot, the following processes respectively: acquiring a characteristic value of the face snapshot; respectively obtaining the comparison scores between the characteristic values of the face snap images and the characteristic values of the target face images in the target library; and taking the target face image with the comparison score larger than a preset threshold value as a retrieval result corresponding to the face snapshot image.

18. The apparatus of claim 17, wherein,

the fourth processing module is further configured to summarize the obtained search results to obtain a candidate target face atlas, if each target face image in the candidate target face atlas corresponds to one face snapshot image, take the target face image with the highest corresponding comparison score as the face recognition result, and if the number of face snapshots corresponding to at least one target face image in the candidate target face atlas is greater than 1, take the target face image with the largest number of corresponding face snapshots as the face recognition result.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-9.