WO2021259033A1

WO2021259033A1 - Facial recognition method, electronic device, and storage medium

Info

Publication number: WO2021259033A1
Application number: PCT/CN2021/098156
Authority: WO
Inventors: 丁肇臻; 侯春华; 申光
Original assignee: 中兴通讯股份有限公司
Priority date: 2020-06-24
Filing date: 2021-06-03
Publication date: 2021-12-30
Also published as: BR112022026549A2; CN113836980A

Abstract

A facial recognition method, an electronic device, and a storage medium. The method comprises: extracting, from a video stream, multiple facial image frames comprising a target face (S100); separately performing facial feature extraction on the multiple facial image frames to obtain a first facial feature (S200); performing feature enhancement on the first facial feature, and fusing the enhanced first facial feature to obtain a second facial feature (S300); and comparing the second facial feature with a pre-stored third facial feature to determine a facial recognition result (S400).

Description

Face recognition method, electronic equipment and storage medium

Cross-references to related applications

This application is filed based on a Chinese patent application with an application number of 202010587883.6 and an application date of June 24, 2020, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated into this application by reference.

Technical field

This application relates to the field of image processing technology, and in particular to a face recognition method, electronic equipment, and storage medium.

Background technique

At present, face recognition is widely used in various application scenarios such as security monitoring, criminal arrest, and crowd statistics analysis. However, face recognition is susceptible to interference from various external noises in the actual application process. For example: face deflection; large side face; motion blur and out-of-focus blur; face has obstructions (such as masks, sunglasses); low light intensity and contrast; artificial blocks generated by the encoding and decoding process of video transmission, etc. Due to the interference of noise, the accuracy of face recognition is greatly reduced, which limits the application and development of face recognition technology.

Summary of the invention

The following is an overview of the topics detailed in this article. This summary is not intended to limit the scope of protection of the claims.

The embodiments of the present application provide a face recognition method, an electronic device, and a storage medium, which can reduce the influence of noise interference on the accuracy of face recognition, thereby improving the success rate of face recognition.

On the one hand, an embodiment of the present application provides a face recognition method, which includes: extracting multiple frames of face images containing a target face from a video stream; extracting face features of the multiple frames of the face images, respectively, Obtain a first face feature; perform feature enhancement on the first face feature, and fuse the enhanced first face feature to obtain a second face feature; combine the second face feature with pre-stored The third face feature is compared to determine the face recognition result.

On the other hand, an embodiment of the present application provides an electronic device that includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor. When the processor executes the program, the claims are as stated above. The steps of the face recognition method described.

In another aspect, an embodiment of the present application provides a computer-readable storage medium storing a computer program, which when executed by a processor, implements the steps of the face recognition method described above.

Other features and advantages of the present application will be described in the following description, and partly become obvious from the description, or understood by implementing the present application. The purpose and other advantages of the application can be realized and obtained through the structures specifically pointed out in the description, claims and drawings.

Description of the drawings

The accompanying drawings are used to provide a further understanding of the technical solution of the present application, and constitute a part of the specification. Together with the embodiments of the present application, they are used to explain the technical solution of the present application, and do not constitute a limitation to the technical solution of the present application.

FIG. 1 is a flowchart of a face recognition method provided by an embodiment of the present application;

FIG. 2 is a sub-flow chart of step S100 in FIG. 1;

FIG. 3 is a sub-flow chart of step S110 in FIG. 2;

Fig. 4 is a sub-flow chart of step S130 in Fig. 2;

Fig. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

detailed description

In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the application, and are not used to limit the application.

It should be understood that in the description of the embodiments of the present application, multiple (or multiple) means two or more, greater than, less than, exceeding, etc. are understood to not include the number, and above, below, and within are understood to include the number. If there are descriptions of "first", "second", etc., only for the purpose of distinguishing technical features, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features or implicitly indicating the indicated The precedence of technical characteristics.

Fig. 1 shows a flowchart of a face recognition method provided by an embodiment of the present application. As shown in FIG. 1, the method includes but is not limited to the following steps S100 to S400.

Step S100: Extract multiple frames of face images containing the target face from the video stream.

In a specific implementation, the video can be collected through the front-end camera, and then the video stream output by the camera is subjected to subsequent processing to obtain multiple frames of face images containing the target face. In step S100 of the embodiment of the present application, extracting a multi-frame face image containing a target face from a video stream can be implemented through steps S110 to S130 as shown in FIG. 2.

Step S110: Extract multiple frames of first face images containing the target face from the video stream.

In some examples, step S110 may be specifically implemented by steps S111 and S112 as shown in FIG. 3.

Step S111: Perform face detection on the video stream, and obtain face position information of the target face in the current frame of the video stream.

In some examples, a face detection network such as Multi-tasks Cascade Neural Network (MTCNN) and RetinaFace may be used to obtain the position information of the target face in the video screen of the current frame. Wherein, the location information may be information such as the location information of the key points of the face and the face boundary information.

Step S112: Perform face trajectory tracking according to the face position information, and extract multiple frames of first face images containing the target face from the video stream.

In some examples, the position of the target face in the video frame of the current frame obtained during face detection can be used to predict the position of the target face in the next frame of the video frame, thus achieving face trajectory tracking. By tracking the target face trajectory, the image of the target face can be intercepted from the multi-frame video images of the video stream, so as to obtain a series of face trajectory images containing the target face, and use a series of face trajectory images as Multiple frames of the first face image.

In some embodiments, the key point position information of the face may specifically include multiple contour point position information. Wherein, the multiple contour point position information may include left eye position information, right eye position information, nose position information, left mouth corner position information, and right mouth corner position information.

Correspondingly, as shown in FIG. 3, step S110 may further include step S113, according to the position information of multiple contour points, calibrating the angle of the target face in the first face image.

In some examples, since the target object may be active in the captured video stream, the first face image composed of a series of face trajectory images obtained through face trajectory tracking may be part of the first face image. The angle of the target face is oblique. In this way, the above-mentioned contour point position information can be used to achieve calibration of the target face in the first face image. Specifically, the aforementioned multiple contour point position information may be input into the face calibration algorithm, and the face calibration algorithm may be used to perform tilt correction on the target face in the first face image.

Step S120: Perform face quality analysis and processing on the first face images of multiple frames to obtain the prior information of the face of each frame of the first face image.

In this embodiment of the application, a lightweight face quality evaluation algorithm is used to perform face quality analysis processing on each frame of the first face image, and obtain the prior information of the face corresponding to each frame of the first face image.

Specifically, multiple dimensions of face quality evaluation may be performed on each frame of the first face image, and the face prior information obtained in this way includes multiple different types of index parameters.

In some examples, the index parameters may include three types of index parameters: blur degree parameters, deflection angle parameters, and resolution parameters. In specific implementation, the lightweight face feature extraction model can be used to obtain the feature length of the first face image, and the fuzzy degree parameters can be determined according to the obtained feature length. Generally speaking, the larger the feature length, the more the blur degree. Low; local feature binarization LBP can be used to binarize the first face image, output the face symmetry index, and determine the deflection angle parameter according to the face symmetry index, for example, when the symmetry index is 1, it is represented as Face angle, deflection angle is 0; use the left eye position information and right eye position information obtained during face detection to determine the interpupillary distance, and determine the resolution parameters according to the interpupillary distance. Generally speaking, the larger the interpupillary distance, the higher the resolution , The smaller the interpupillary distance, the lower the resolution.

It should be understood that the embodiments of this application are not limited to the above three index parameters, and may also include other different types of index parameters, or replace any one or more of the above three index parameters with other different types of index parameters. The implementation of this application The example does not limit this.

The embodiment of the present application performs multi-dimensional quality evaluation on each frame of the first face image by using multiple different types of index parameters, so as to reflect the strength of the face detail features in the first face image in different dimensions. In the foregoing example, a lightweight face quality scoring method is used to examine face quality in the three dimensions of blurring, face deflection angle, and resolution, and the obtained prior face information is used for subsequent selection quality on the one hand Higher images are used for facial feature extraction to ensure that the extracted features have good richness and feature diversification; on the other hand, it can be used in subsequent feature enhancement links to improve the generalization of facial features.

Step S130, selecting multiple frames of second face images from the multiple frames of first face images according to the face prior information of each frame of the first face image. Specifically, step S130 may include steps S131 to S133 as shown in FIG. 4.

Step S131: Linearly weight multiple index parameters to obtain a global quality score, and obtain a first preset number of primary selected images from multiple frames of first face images according to the global quality score.

In some examples, the global quality score may be obtained by linearly weighting multiple index parameters included in the face prior information in step S120. Through the global quality score, the comprehensive quality evaluation of the first face image of each frame can be carried out, and the first face image of each frame can be ranked according to the global quality score, and the first face image with the highest ranking is selected as the primary selection image. Moreover, the number of primary selected images to be acquired can be determined by pre-setting the first preset number value.

As an example, the first preset quantity may be a percentage value, for example, the first preset quantity is set to 30%. When 100 frames of the first face image are extracted from the video stream through step S110, and the face prior information of each frame of the first face image is obtained according to the method provided in step S120, the face prior information contains Perform linear weighting calculation on multiple index parameters of, and obtain the global quality score of the first face image in each frame. Then, according to the global quality score of each frame of the first face image, the 100 frames of the first face image are ranked according to the score from high to low, and the top 30 first face images are taken as the primary selection image.

In step S132, the first preset number of primary selection images are arranged and combined to obtain multiple primary selection image combinations, wherein each primary selection image combination includes a second preset number of primary selection images.

Continuing to use the foregoing example, the second preset number can be set according to the number of second face images to be finally obtained, for example, the second preset number is set to 3. In this way, the 30 primary selected images can be permuted and combined to obtain

A primary selection image combination, each primary selection image combination contains 3 frames of primary selection images.

Step S133: Obtain the image discrimination degree parameter of each primary selection image combination according to the multiple index parameters, and select the final selection image combination from the multiple primary selection image combinations according to the image discrimination degree parameter.

Among them, the image discrimination degree parameter is used to characterize the difference of face detail features between the multiple frames of the primary images included in the primary image combination. Generally speaking, under the premise of ensuring the quality of the face, the greater the degree of differentiation of the face detail features, the more available information the image combination contains. The facial features extracted in this way show strong generalization and more generalization. Adapt to the face recognition system in open scenes.

The image distinguishing degree parameter of the primary selected image combination can be determined by calculating the cumulative distance between the images in the combination in multiple dimensions to determine the image distinguishing degree parameter of the current primary selected image combination.

Continue to use the previous example, assuming that the current primary selection image combination is T1, the combination T1 contains three images numbered P1, P2, and P3, and calculate the distances between P1 and P2, P1 and P3, and P2 and P3 in each dimension. For example, calculate the distance between P1 and P2 in the three dimensions of blur degree, face deflection angle, and resolution as S1 (P1P2), S2 (P1P2), S3 (P1P2), and calculate the degree of blur of P1 and P3 The distances in the three dimensions of, face deflection angle, and resolution are S1 (P1P3), S2 (P1P3), and S3 (P1P3). Calculate P2 and P3 in the three dimensions of blur degree, face deflection angle, and resolution The distances above are respectively S1(P2P3), S2(P2P3), S3(P2P3), then the image discrimination degree parameter of the primary selected image combination T1 is:

S1=S1(P1P2)+S2(P1P2)+S3(P1P2)+S1(P1P3)+S2(P1P3)+S3(P1P3)+S1(P2P3)+S2(P2P3)+S3(P2P3).

According to the calculated image discrimination degree parameters of each initial selection image combination, the initial selection image combination with the largest image discrimination degree parameter is selected as the final selection image combination.

In step S134, the image included in the final selected image combination is used as the second face image.

When the final selected image combination is selected, the image included in the final selected image combination is used as the second face image. For example, if the combination T1 is selected as the final image combination, the images P1, P2, and P3 included in the combination T1 are used as the second face image.

Step S200: Perform face feature extraction on multiple frames of face images to obtain a first face feature.

In some examples, the face image in step S200 may be multiple frames of second face images obtained through step S134.

In some examples, a neural network may be used to extract face features of multiple frames of face images to obtain the first face feature. Among them, the extracted first face feature includes a multi-dimensional face vector. The neural network can use a facial feature extraction algorithm such as Resnet152 to output a set of 256-dimensional deep facial features. These features represent the original face image information encoding without feature enhancement.

In step S300, feature enhancement is performed on the first face feature, and the enhanced first face feature is merged to obtain a second face feature.

In some examples, a deep convolutional neural network is used to perform a dot product operation on the first face feature and face prior information to obtain the enhanced first face feature. Wherein, the face prior information is obtained by performing face quality analysis processing on the face image in the aforementioned step S120.

Following the previous example, the 256-dimensional deep face features extracted from the second face image by the Resnet152 algorithm and the face prior information corresponding to the second face image (blur degree parameter, face deflection angle parameter, Resolution parameter) is input to the deep convolutional neural network, and the deep face feature and the face prior information are multiplied by the deep convolutional neural network to use the face prior information output by the face quality evaluation algorithm. Face features are enhanced.

Different from traditional image-level enhancement methods, such as image deblurring, super-resolution, etc., the embodiment of the present application adopts a feature-level enhancement method. Compared with image-level enhancement methods, the advantage of using feature-level enhancement is that the processing object is a set of multi-dimensional face vectors, and the amount of calculation is small, which can greatly improve the processing efficiency.

In some examples, the deep convolutional neural network used to implement feature enhancement may be a series connection of two fully connected layers, which are trained on a face feature extraction data set to obtain a feature enhancement module, which is used to compensate the original features. The three quality indicators output by the face quality scoring module reflect the strength of the face image in the three dimensions of blur, deflection angle and resolution. These indicators are used to control the feature enhancement module to enhance the original features through the dot multiplication operation. deal with.

After the enhancement processing of the first face feature is completed, the enhanced first face feature is merged to obtain the second face feature.

Specifically, the enhanced first face feature can be merged through an average pooling operation to obtain the second face feature.

In step S400, the second face feature is compared with the pre-stored third face feature to determine the face recognition result.

Specifically, the European algorithm can be used to compare the second face feature with the pre-stored third face feature to determine the face recognition result.

The face recognition method provided by the embodiment of the present application will be further exemplified below in conjunction with specific application scenarios.

Scene 1: Smart city night face monitoring scene

In the smart city information construction implemented by the country, smart security monitoring systems play an important role. The traditional face recognition monitoring system shows better performance in sunny days with good lighting conditions. At night, due to complex night scenes, low brightness, aging of supplementary light equipment, poor angle configuration, temperature, rain and snow, and many other reasons, traditional monitoring systems often experience a significant decline in recognition accuracy. The construction of urban night monitoring system is of great significance to the monitoring of wanted persons and social idlers. Based on this background, this example illustrates a face recognition monitoring system in an urban nighttime deployment and control scene. When the face recognition method provided by the embodiment of the present application is applied to the system, the following method steps may be specifically included:

Step S501: Collect face image sets of fugitives, social idlers, and key surveillance personnel. These facial images are usually frontal, high-definition pictures, so no additional image processing is required. Use facial feature extraction algorithms to encode these facial images and store them to form a base database set.

Step S502: Obtain surveillance videos collected by the surveillance equipment in the surveillance area at night. The surveillance area may be a residential area, a street, or a fixed area. Surveillance video can be transmitted by online video streaming, or it can be saved locally. These video stream information will be transmitted to the back-end data processing module, ready for video image analysis.

Step S503: Perform face detection and trajectory tracking on the video information collected by each monitoring device to obtain a group of face trajectory images containing the target face.

In step S504, a lightweight face quality evaluation algorithm is used to score each face image in the trajectory in the three dimensions of blur degree, deflection angle, and resolution. At the same time, the overall quality scoring results of the three indicators are also output. The score is the linear weight of the three indicators, and the weighting coefficient is obtained by the regression method. The quality of the face image is given by the global quality score, and the three indicators reflect the strength of face detail features in different dimensions.

In step S505, according to the global quality score and the three-dimensional indicators, multiple images with relatively high quality and high degree of discrimination of face detail features are selected from the face trajectory images as the face candidate set.

Step S506: Perform face feature extraction on the images of the face candidate set, use feature-level enhancement methods to perform enhancement processing on the extracted face features, perform an average pooling operation on the enhanced face features, and collect the face candidates The facial features of all are merged, and the facial features that are finally used for subsequent comparison and matching are output.

Step S507: Compare the facial features output in step 506 with the facial features stored in the base database, and calculate the Euclidean distance between the two. When the distance is less than a certain threshold, the captured face is considered to be in the base database. Stored ID matches of fugitives or social idlers. Send a signal to the terminal device and display the recognition result on the display device.

Scenario 2: Monitoring and analysis scenario of personnel activity trajectory in a crowded environment

In the intelligent upgrading and transformation of large public places, the use of personnel activity trajectory information to count personnel residence time, personnel density, and crowd flow has high economic value and social significance. For example, in the subway transfer center, the trajectory of personnel is counted, and the flow of personnel is analyzed, so that evacuation channels can be rationally deployed and the transfer efficiency can be improved. For another example, in a large shopping mall, analyzing the residence time of personnel and the flow of people have important reference values for rationalizing the location of exhibition areas and arranging merchandise sales areas. When connecting people's trajectories, because the collection range of a single video capture device is limited, the face images often come from multiple collection devices. In order to obtain the correct path of the face trajectory, the facial features must have strong generalization. In order to match the correct trajectory path. However, in crowded scenes, human faces are often prone to occlusion, side face and motion blur, and these noises seriously hinder the stability of human face features. Based on this background, this example illustrates a system for monitoring and analyzing people's activity trajectories in crowded scenes. When the face recognition method provided in the embodiments of this application is applied to the system, the following method steps may be specifically included:

Step S601: Obtain surveillance videos of various surveillance devices in a public place over a period of time. The places here may specifically be public areas such as shopping malls, subway transfer centers, and airports. Assign ID numbers corresponding to the surveillance videos collected by each surveillance device, such as ID1, ID2, ..., IDN. The video data collected by these monitoring equipment is transmitted to the background for video image analysis.

In step S602, the face detection and face tracking methods are adopted to process the video stream collected by the monitoring device of each ID to obtain a face trajectory image set corresponding to the ID.

In step S603, a lightweight face quality evaluation algorithm is used to score each face image in the face trajectory image set corresponding to the ID in the three dimensions of blurriness, deflection angle, and resolution. At the same time, the overall quality scoring results of the three indicators are also output.

Step S604: Generate a face candidate set corresponding to the ID according to the global quality score and three-dimensional indicators.

Step S605: Perform face feature extraction, enhancement and fusion on the images contained in the face candidate set corresponding to the ID, and output the face features corresponding to the ID;

In step S606, each ID contains a certain number of facial features, and these facial features represent the number of people captured by this monitoring device during this period of time. Calculate the Euclidean distance between the face features of ID1, ID2, ..., IDN. When the distance is less than a certain threshold, the identity matching is considered successful, and the trajectory information of the person under different ID monitoring equipment is associated;

In step S607, the personnel trajectory is saved in the database in the form of a time axis, or displayed on the interface for the operator to read or use.

The solutions provided by the embodiments of the present application aim at the situation that the traditional solutions are susceptible to various types of noise interference in an open monitoring scenario, and a large number of optimizations are performed on the data processing unit, which greatly improves the overall performance of the face recognition monitoring system. Specifically embodied in:

Greatly improve the recognition accuracy of the system: in a typical surveillance scene, a single face image captured by the face detection algorithm is often disturbed by noise, and there are often various types of defects in the details of the face. For example, in a certain surveillance area, when the distance between the object and the collection device is large, the collected face image will often be affected by out-of-focus blur, and when the distance between the two is small, the face image Motion blur is often produced. The face recognition method proposed in the present application uses multiple face images of one motion track of the same object to perform feature fusion extraction, which effectively avoids the lack of information that may occur in the traditional use of a single face image. At the same time, the prior information of the face image in multiple dimensions is used to enhance the face features, and the finally obtained face features have a high degree of generalization. At the same time, the face image is enhanced through feature enhancement. For some highly incomplete face images, such as yin and yang faces, large deflection angles, scarf masks, etc., it can also maintain considerable feature generalization.

Greatly improve the operating efficiency of the system: Under the premise of ensuring the high recognition accuracy of the system, the face recognition method proposed in this application is based on the principle of lightweight design and is used in the face quality evaluation, face feature fusion and face feature enhancement modules A lightweight deep neural convolutional network is used. For example, the face quality evaluation algorithm calculates in the three dimensions of blur, deflection angle and resolution, so as to avoid excessive occupation of system performance, and can better adapt to the real-time requirements of the face recognition monitoring system.

FIG. 5 shows an electronic device 70 provided by an embodiment of the present application. As shown in Fig. 5, the electronic device 70 includes but is not limited to:

The memory 72 is set to store programs;

The processor 71 is configured to execute the program stored in the memory 72. When the processor 71 executes the program stored in the memory 72, the processor 71 is configured to execute the aforementioned face recognition method.

The processor 71 and the memory 72 may be connected by a bus or in other ways.

As a non-transitory computer-readable storage medium, the memory 72 can be configured to store non-transitory software programs and non-transitory computer-executable programs, such as the face recognition method described in the embodiments of the present application. The processor 71 executes the non-transitory software programs and instructions stored in the memory 72 to realize the aforementioned face recognition method.

The memory 72 may include a program storage area and a data storage area. The program storage area may store an operating system and an application program required by at least one function; the data storage area may store and execute the aforementioned face recognition method. In addition, the memory 72 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the memory 72 includes a memory remotely provided with respect to the processor 71, and these remote memories may be connected to the processor 71 via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

The non-transitory software programs and instructions required to implement the above-mentioned face recognition method are stored in the memory 72, and when executed by one or more processors 71, the above-mentioned face recognition method is executed, for example, as described in FIG. 1 The method steps S100 to S400 are described in FIG. 2, the method steps S110 to S130 are described in FIG. 2, the method steps S111 to S113 are described in FIG. 3, and the method steps S131 to S134 are described in FIG. 4.

The embodiment of the present application also provides a storage medium storing computer-executable instructions, and the computer-executable instructions are used to execute the aforementioned face recognition method.

In an embodiment, the storage medium stores computer-executable instructions, and the computer-executable instructions are executed by one or more control processors 71, for example, executed by a processor 71 in the aforementioned electronic device 70, so that the aforementioned One or more processors 71 execute the aforementioned face recognition method, for example, execute the method steps S100 to S400 described in FIG. 1, the method steps S110 to S130 described in FIG. 2, and the method steps S111 to S113 described in FIG. , The method steps S131 to S134 described in FIG. 4.

The embodiments of the application include: extracting multiple frames of face images containing a target face from a video stream; performing face feature extraction on multiple frames of the face images to obtain first face features; The face features are feature-enhanced, and the enhanced first face features are merged to obtain the second face feature; the second face feature is compared with the pre-stored third face feature to determine the face Recognition results. The technical solution provided by the embodiments of the present application performs face recognition based on the facial features extracted from multiple frames of face images, so that the face feature samples are richer and diversified, feature complementarity is achieved, and the information available for face recognition is more This overcomes the problem that the traditional method only performs face recognition based on the characteristics of a single image, and the recognition result is greatly affected by noise interference. The embodiment of the present application also performs feature enhancement and fusion on the first face feature extracted from the multiple frames of the face image, so as to realize compensation for the face feature, and further improve the success rate and reliability of face recognition.

The embodiments described above are merely illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

A person of ordinary skill in the art can understand that all or some of the steps and systems in the methods disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof. Certain physical components or all physical components can be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on a computer-readable medium, and the computer-readable medium may include a computer storage medium (or a non-transitory medium) and a communication medium (or a transitory medium). As is well known by those of ordinary skill in the art, the term computer storage medium includes volatile and non-volatile data implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Sexual, removable and non-removable media. Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other storage technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or Any other medium used to store desired information and that can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media usually include computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media. .

The above is a detailed description of the preferred implementation of the application, but the application is not limited to the above-mentioned embodiments. Those skilled in the art can also make various equivalent modifications or substitutions without departing from the solution of the application. Equivalent modifications or replacements are all included in the scope defined by the claims of the present invention.

Claims

A face recognition method, including:

Extract multiple frames of face images containing the target face from the video stream;

Performing face feature extraction on multiple frames of the face images to obtain the first face feature;

Performing feature enhancement on the first face feature, and fusing the enhanced first face feature to obtain a second face feature;

The second face feature is compared with the pre-stored third face feature to determine the face recognition result.
The face recognition method according to claim 1, wherein said extracting a multi-frame face image containing a target face from a video stream comprises:

Extracting multiple frames of first face images containing the target face from the video stream;

Performing face quality analysis processing on multiple frames of the first face image to obtain face prior information of the first face image in each frame;

Selecting multiple frames of second face images from multiple frames of the first face images according to the face prior information of each frame of the first face image;

The step of extracting the face features of the multiple frames of the face images to obtain the first face features includes:

Face feature extraction is performed on the multiple frames of the second face images to obtain the first face feature.
The face recognition method according to claim 2, wherein the prior information about the face includes a plurality of different types of index parameters;

The selecting multiple frames of second face images from multiple frames of the first face images according to the face prior information includes:

Linearly weighting the multiple index parameters to obtain a global quality score, and acquiring a first preset number of primary selected images from multiple frames of the first face images according to the global quality score;

Arranging and combining the first preset number of the primary selection images to obtain a plurality of primary selection image combinations, wherein each primary selection image combination includes a second preset number of the primary selection images;

Acquiring an image discrimination degree parameter of each of the primary selection image combinations according to the multiple index parameters, and selecting a final selection image combination from the plurality of primary selection image combinations according to the image discrimination degree parameter;

The primary selection image included in the final selection image combination is used as the second face image.
The face recognition method according to claim 3, wherein the index parameters include blur degree parameters, deflection angle parameters, and resolution parameters.
The face recognition method according to claim 2, wherein said extracting from a video stream a multi-frame first face image containing the target face comprises:

Performing face detection on the video stream, and obtaining facial position information of the target face in the current frame of the video stream;

Perform face trajectory tracking according to the face position information, and extract multiple frames of first face images containing the target face from the video stream.
The face recognition method according to claim 5, wherein the face position information includes a plurality of contour point position information;

The extracting multiple frames of first face images containing the target face from the video stream further includes:

Calibrating the angle of the target face in the first face image according to the position information of the plurality of contour points.
The face recognition method according to claim 1, wherein said performing face feature extraction on multiple frames of said face images to obtain the first face feature comprises:

A neural network is used to extract the face features of the multiple frames of the face images to obtain the first face features; wherein the extracted first face features include multi-dimensional face vectors.
The face recognition method according to claim 7, wherein said performing feature enhancement on said first face feature comprises:

Use a deep convolutional neural network to perform a dot product operation on the first face feature and face prior information to obtain the enhanced first face feature; wherein, the face prior information is obtained by The face image is obtained by analyzing and processing the face quality.
The face recognition method according to claim 1 or 7, wherein said fusing the enhanced first face feature to obtain the second face feature comprises:

The enhanced first face feature is merged through an average pooling operation to obtain a second face feature.
The face recognition method according to claim 1, wherein comparing the second face feature with a pre-stored third face feature to determine a face recognition result comprises:

The European algorithm is used to compare the second face feature with the pre-stored third face feature to determine the face recognition result.
An electronic device comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the processor implements the method of any one of claims 1-10 when the program is executed step.
A computer-readable storage medium storing a computer program, wherein the program is executed by a processor to implement the steps of the method described in any one of claims 1-10.