CN111783681A

CN111783681A - Large-scale face library recognition method, system, computer equipment and storage medium

Info

Publication number: CN111783681A
Application number: CN202010633856.8A
Authority: CN
Inventors: 梁秋霞; 陈煦文
Original assignee: Shenzhen Vanrui Intelligent Technology Co ltd
Current assignee: Shenzhen Vanrui Intelligent Technology Co ltd
Priority date: 2020-07-02
Filing date: 2020-07-02
Publication date: 2020-10-16
Anticipated expiration: 2040-07-02
Also published as: CN111783681B

Abstract

The invention relates to a large-scale face library recognition method, a large-scale face library recognition system, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a video stream of a collected target area; preprocessing a video stream to obtain a preprocessed face picture; extracting a global or local face characteristic value of the face picture as a face characteristic value to be recognized; obtaining values of the face characteristic value to be recognized on all main characteristic dimensions; according to the value of the face characteristic value to be recognized on the main characteristic dimension, screening face characteristic vectors of a face library through an attention cascade frame, and screening out the most similar face characteristic vectors for comparison; and comparing the characteristic value of the face to be recognized with the face characteristic vector screened for comparison to obtain the face with the highest matching degree, and determining the identity of the face. The scheme can rapidly reduce the size of the face library to be recognized, greatly improves the face recognition efficiency while ensuring the recognition accuracy, and reduces the performance requirement on the recognition server.

Description

Large-scale face library recognition method, system, computer equipment and storage medium

Technical Field

The present invention relates to the field of face recognition, and more particularly, to a method, system, computer device and storage medium for large-scale face library recognition.

Background

With the popularization of artificial intelligence technology, the landing application of identity verification through face recognition in communities, parks and public transportation is more and more popular. The main steps of carrying out face recognition through the front camera on the gate comprise the following steps: video acquisition is carried out through a camera, frames are taken through video streaming, face detection is carried out based on static frames, face picture preprocessing, face feature extraction and face library retrieval are carried out. In order to meet the requirement on the real-time performance of the recognition result in service and avoid the influence of network abnormality, the whole process of the current face recognition is usually realized on a local face recognition intelligent terminal. The current intelligent terminal can generally support face library retrieval at the level of ten thousand people.

In the existing application scenes of large parks, delivery cities or passing through open areas, the population of a management area can reach hundreds of thousands or even millions, and a single intelligent terminal is difficult to support the large-scale face library retrieval requirement. Therefore, a common solution is to deploy a large local identification server at the edge side of the campus and the delivery city, and the face library retrieval task is undertaken by the identification server.

Aiming at the face recognition requirement of less than one hundred thousand people in a common park, the scheme of directly searching nearest neighbors in a face library by using a local recognition server can meet the service requirement. However, in projects such as city creation, district passing, and public transportation stations, the face library can reach millions or even tens of millions, and as the data set is increased, and the feature vector of the face is usually a high-dimensional vector (for example, 160 dimensions), the calculation amount for performing comparison once in the library is extremely large. Although attempts can be made to solve this problem by increasing the computational resources and incorporating GPUs in parallel computing, such overhead is often not affordable in practical applications.

Therefore, in view of the scenes such as city delivery, passing through open areas, and public transportation stations which need to be retrieved in a large-scale face database, the problem to be solved at present is how to guarantee the real-time performance and accuracy of face recognition under the condition of limited computing resources.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a large-scale face library recognition method, a large-scale face library recognition system, a large-scale computer device and a storage medium.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, the present invention provides a large-scale face library recognition method, including the following steps:

acquiring a video stream of a collected target area;

preprocessing a video stream to obtain a preprocessed face picture;

extracting a global or local face characteristic value of the face picture as a face characteristic value to be recognized, wherein the face characteristic value to be recognized is a high-dimensional characteristic vector;

obtaining values of face feature values to be recognized on all main feature dimensions, wherein the main feature dimensions are determined by analyzing sample distribution dispersion on each feature dimension after different face pictures of a large number of people are taken as sample sets and subjected to feature extraction through a convolutional neural network;

according to the value of the face characteristic value to be recognized on the main characteristic dimension, screening face characteristic vectors of a face library through an attention cascade frame, and screening out the most similar face characteristic vectors for comparison;

and comparing the characteristic value of the face to be recognized with the face characteristic vector screened for comparison to obtain the face with the highest matching degree, and determining the identity of the face.

In a second aspect, the present invention provides a large-scale face library recognition system, including:

the video acquisition unit is used for acquiring the acquired video stream of the target area;

the preprocessing unit is used for preprocessing the video stream to obtain a preprocessed face picture;

the characteristic extraction unit is used for extracting a global or local face characteristic value of the face picture as a face characteristic value to be recognized, and the face characteristic value to be recognized is a high-dimensional characteristic vector;

the main feature extraction unit is used for acquiring values of human face feature values to be recognized on all main feature dimensions, wherein the main feature dimensions are determined by analyzing sample distribution dispersion on each feature dimension after different human face pictures of a large number of people are taken as sample sets and subjected to feature extraction through a convolutional neural network;

the face library screening unit is used for screening face characterization vectors of the face library through an attention cascade frame according to the value of the face characteristic value to be recognized on the main characteristic dimension, and screening the closest face characterization vectors for comparison;

and the feature comparison unit is used for comparing the feature value of the face to be recognized with the face characterization vectors screened out for comparison to obtain the face with the highest matching degree, and determining the identity of the face.

In a third aspect, the present invention provides a computer device, which includes a memory and a processor, where the memory stores a computer program thereon, and the processor implements the large-scale face library recognition method as described above when executing the computer program.

In a fourth aspect, the present invention provides a storage medium storing a computer program, which when executed by a processor can implement the large-scale face library recognition method as described above.

Compared with the prior art, the invention has the beneficial effects that: the face characteristic value of the processed face picture is effectively extracted through a convolutional neural network characteristic extraction model with an improved training algorithm, so that the extracted face characteristic value can be directly compared with a face characteristic vector in a face library, the similarity is judged, and the identity of a person corresponding to the face picture is determined; in addition, before the face characteristic value to be recognized is compared with the face characteristic vector in the face library, based on the value of the face characteristic value to be recognized in the main characteristic dimension, the size of the face library which can be used for comparison can be rapidly reduced by combining the attention cascade framework, the face recognition efficiency can be greatly improved while the recognition accuracy is guaranteed, and the performance requirement on the recognition server is reduced.

The invention is further described below with reference to the accompanying drawings and specific embodiments.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a large-scale face library recognition method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a large-scale face library recognition method according to an embodiment of the present invention;

fig. 3 is a schematic sub-flow diagram of a large-scale face library recognition method according to an embodiment of the present invention;

fig. 4 is a schematic sub-flow diagram of a large-scale face library recognition method according to an embodiment of the present invention;

fig. 5 is a schematic flow chart of a large-scale face library recognition method according to another embodiment of the present invention;

FIG. 6 is a diagram of the training architecture of the feature extraction model and the face recognition model of the present invention;

FIG. 7 is a schematic view of a process for performing face library screening in the attention cascade box of the present invention;

FIG. 8 is a schematic block diagram of a large-scale face library recognition system provided by an embodiment of the present invention;

FIG. 9 is a schematic block diagram of a preprocessing unit of a large-scale face library recognition system according to an embodiment of the present invention;

FIG. 10 is a schematic block diagram of a vector filtering unit of the large-scale face database recognition system according to an embodiment of the present invention;

fig. 11 is a specific application framework diagram of the large-scale face library recognition system according to the embodiment of the present invention;

FIG. 12 is a schematic block diagram of a computer device provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of a large-scale face database recognition method according to an embodiment of the present invention. Fig. 2 is a schematic flow chart of a large-scale face library recognition method according to an embodiment of the present invention. The large-scale face library recognition method is applied to a server, the server and a terminal carry out data interaction, the terminal collects a video stream of a target area, the terminal preprocesses the video stream, a feature extraction model 100 in the terminal extracts the face feature values of global or local persons to be recognized of a face picture, the terminal acquires the feature values of the face feature values to be recognized on different main feature dimensions, and according to the feature values of the different main feature dimensions and the regions where the face picture is located, the server screens face feature vectors of the face library through an attention cascade frame to screen the most similar face vectors, finally compares the face feature values to be recognized with the face feature vectors screened from the face library to obtain the face with the highest matching degree and determine the identity of the face to be recognized, and the size of the face library can be rapidly reduced through screening the face library, the face recognition efficiency can be greatly improved while the recognition accuracy is guaranteed, the performance requirement on a recognition server is reduced, and the terminal refers to an intelligent terminal carrying a camera device, such as an intelligent monitoring terminal, a mobile phone, a monitoring camera and the like.

Fig. 2 is a schematic flow chart of a large-scale face library recognition method according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S110 to S150.

And S110, acquiring the acquired video stream of the target area.

In this embodiment, different monitoring areas are provided with different intelligent monitoring devices, such as an intelligent camera, an intelligent monitoring terminal, and the like, and video streams of the target area can be directly acquired through the corresponding monitoring devices.

And S120, preprocessing the video stream to obtain a preprocessed face picture.

In this embodiment, the video stream is preprocessed to obtain a face picture with obvious face features, and the preprocessed face picture is output to further perform face feature extraction.

Referring to FIG. 3, in one embodiment, step S120 includes steps S121-S125.

And S121, extracting image frames of the video stream at set time intervals.

In this embodiment, the video stream is composed of multiple frames of images, and the image frames are sequentially extracted according to a set time interval, so that on the premise that the image frames containing the human face can be acquired, the subsequent workload can be reduced, and the efficiency of extracting and identifying the human face features can be improved.

And S122, detecting the face of the image frame, marking the coordinates of key points of the face, the relative position and the size in the face picture in the image frame, and capturing a face picture from the face frame.

In this embodiment, a face object in an image frame is detected, face key point coordinates are calibrated, a face angle and a relative position and size of a face in a picture can be determined according to the face key point coordinates, and the face identified in the image frame is intercepted, so that the face picture with the best quality of passing people can be conveniently compared and selected subsequently.

And S123, extracting motion information of the pixel points according to the optical flow information of each frame of image frame, and labeling and tracking the same face in the continuous frames.

In this embodiment, according to the optical flow information of each frame of image frame, the change of the same pixel point in the image frame can be obtained, that is, the labeling and tracking can be performed on the face of the same person, the labels of the face images of the same person are the same, and all the face images of the same person can be obtained according to the labels.

And S124, selecting the picture with the best quality in the same labeled human face picture within a period of time.

In the embodiment, after the image frame with the best quality or the image frame which best meets the specification is selected from the image frames of the face of the same person, which are cut in a period of time, the influence of other irrelevant features on the face features is eliminated.

And S125, carrying out gray level transformation on the face picture with the best quality, and adjusting the gray level distribution of the face picture.

In this embodiment, the gray level of the selected face picture is transformed, the gray level distribution of the face picture is adjusted, the influence of the illumination angle or the light intensity (backlight and overexposure) on the feature extraction and the subsequent recognition task is reduced, and the key points on the face picture can be better highlighted.

And S126, recognizing the face key points on the face picture after gray level conversion, and adjusting the face picture based on the face key points to enable the face to be located in the center of the face picture and enable the two eyes of the face to be located in horizontal positions.

In this embodiment, the center of the face in the face image is further adjusted, so that the face is located at the center of the image, all the features on the face are located at the obvious position of the image, and subsequent face feature extraction is facilitated.

And S127, adjusting the size of the face picture to a target size to obtain the preprocessed face picture.

In this embodiment, the size of the face image is adjusted so that the sizes of the face images entering the back-end feature extraction stage are consistent.

S130, extracting a global or local face characteristic value of the face picture as a face characteristic value to be recognized, wherein the face characteristic value to be recognized is a high-dimensional characteristic vector.

In this embodiment, a feature extraction model 100 is preset in the terminal, and a global or local face feature value of a preprocessed face picture can be extracted as a face feature value to be recognized through the feature extraction model 100, where the face feature value is a high-dimensional feature vector. Specifically, in this embodiment, the extracted face feature value is a 128-dimensional feature vector, but in another embodiment of the present invention, feature vectors of other dimensions, for example, 256 dimensions, may also be actually required to be adopted. The face characteristic value obtained through characteristic extraction can be used for directly calculating the similarity between faces by combining face characteristic vectors in a face library, namely the vector distance between face pictures of the same person is smaller, and the vector distance between face pictures of different persons is larger. As shown in fig. 5, a training architecture diagram of the feature extraction model 100 and the face recognition model 200 is shown.

Fig. 6 shows a training architecture diagram of the feature extraction model 100, in this embodiment, the following improvements are made to the selection of the linear layer parameters and the training process of the feature extraction model 100:

setting the Linear layer parameters u of the feature extraction model 100_jIn particular, the amount of the surfactant is,

wherein the linear layer parameter u_jRepresenting the face characterization vector corresponding to the jth class of face sample, j being a natural number greater than 0,k is the dimension number of the face characterization vector, M is the number of face pictures of the j-class face sample in the training sample, and x_jiI-th personal face feature vector, R, representing class j sample^KIs a real vector of dimension K.

At the set linear layer parameter u_jThen, this example trains the linear layer parameters u of different types of face samples by EM algorithm_jIn multiple batches of training, u is trained every batch_jThe value is fixed, and the linear layer parameter u of the corresponding human face sample is updated after all training batches are finished_jThe parameter value of (2).

In the present embodiment, the linear layer parameter u of the feature extraction model 100 is set_jThe main feature extraction model 100 can better extract the face characterization vectors based on the persons in the library, and use the extracted face characterization vectors for face recognition of the persons in the library.

As shown in fig. 6, the face recognition system further includes a training framework of the face recognition model 200, in the training framework, the softmax layer calculates the probability that the feature vector of a certain face to be recognized belongs to the person i, and for the feature vector z of the face to be recognized, the determined calculation formula is as follows:

by applying linear layer parameters u_jThe probability distribution calculated at this time is based on the similarity between the face to be detected and various central points.

And after the calculation is finished, taking the maximum probability (and the probability is larger than a set threshold), and determining the person corresponding to the result as the person identity corresponding to the face to be detected.

In addition, the training strategy adopted in the training process of the feature extraction model 100 is different from the existing training strategy, and besides Cross-Engine Loss used in combination with the softmax probability distribution function, CenterLoss is also introduced. Direct face for Central LossCharacteristic value x_i(x_i∈R^KK is 128 dimension in this example) as input, which calculates the dispersion (variance) of the feature vectors of each sample in this category to ensure that the feature vectors extracted from different sample pictures of the same kind (the same registrant) have higher polymerization degree. The method for calculating the Central Loss comprises the following steps:

wherein<x_{_j1},x_{_j2},…,x_{_jM>}The M face image feature values of the representative person j, and N is the total number of the classes (i.e. registered persons) in the library.

And S140, obtaining values of the face characteristic value to be recognized on all main characteristic dimensions.

In this embodiment, the main feature extraction model 100 obtains values of face features to be recognized in different main feature dimensions, the face features to be recognized output by the feature extraction model 100 are high-dimensional feature vectors, the main feature extraction model 100 selects at least three dimensions, which can distinguish the face features most in all the dimensions, as the dimensions of the main feature values, and filters the existing face features in the face library according to the value intervals of the face features to be recognized in the main feature dimensions and by combining an attention cascade framework, so as to remove part of the existing face features, reduce the number of the face features to be recognized and compared, improve the face recognition efficiency without increasing the server load, and ensure the recognition accuracy.

Specifically, the main characteristic dimension is determined by performing feature extraction on different face pictures of a large number of people serving as sample sets through a convolutional neural network and analyzing sample distribution dispersion on each characteristic dimension, and the training step for determining the main characteristic dimension includes steps a10-a 70.

A10, inputting a face sample set, and acquiring feature vectors of different face pictures of all people.

A20, calculating the face characteristic vector u of the same person according to the feature vectors of different face pictures_j。

In this embodiment, all M face pictures of the person j in the face sample set are substituted into the trained feature extraction model 100, and the ith face picture of the person j is taken as an example, and its feature vector is x_jiCalculating a face characterization vector u of the person j based on the feature vectors of the M pictures_j。

In particular, u_jThe calculation formula of (a) is as follows:

from face representation vector u of person j_jA value u in the i dimension is obtained_ji。

A30, acquiring different dimension vector arrays of all the people according to the face characterization vector uj.

And A40, calculating to obtain a variance array of all the people in the K dimension according to the different dimension vector arrays of all the people.

In this embodiment, for all N persons (i.e. N classes) in the face sample set, face characterization vectors u of all the persons in the dimension l are repeatedly calculated_jAn array of values of face characterization vectors of all N persons on l bit can be obtained as follows:<u_1l，u_2l，...，u_Nl>。

and calculating the variance of the array to obtain the variance v of the projection of each person on the dimension l_l：

Further, the above-mentioned calculation step of the dimension l is repeated for all K dimensions/positions of the feature vector, so that the variance array of the projection of each human face characterization vector on all K dimensions can be obtained:

a50, obtaining at least 3 main feature dimensions in the variance array from big to small;

a60, extracting feature vectors of main feature dimensions of the face sample set on the corresponding main feature dimensions;

and A70, dividing the value range of the face sample set on each main feature dimension into L +1 intervals to obtain an interval distribution table.

In this embodiment, based on the obtained variance array, S dimensions with the largest variance value are selected from large to small as main feature dimensions, where S is a natural number greater than or equal to 3, and the size of S may be set as required, and may be 3, 4, 5, and the like.

Specifically, in the embodiment, three dimensions (as identifiers) with the largest variance value are selected from large to small as main feature dimensions, and 3 main feature dimensions are respectively used as the judgment dimensions of the first three classifiers, and the calculation process of the judgment conditions of the three classifiers is as follows, after the input D pieces of face sample set picture data are extracted by the feature extraction model 100, the D pieces of face sample set picture data are input into the main feature extraction model 100 to obtain the feature vector array x on the gamma main feature dimension of α_α、x_βAnd x_γ。

x_α＝[x_1αx_2α… x_Dα]；

x_β＝[x_1βx_2β… x_Dβ]；

x_γ＝[x_1γx_2γ… x_Dγ]。

According to x_α、x_βAnd x_γThe current value range of each feature bit of the array is divided into L +1 intervals (L is an empirical value and is in direct proportion according to the size of a sample) to obtain a judgment interval distribution table, the judgment interval distribution table is a judgment standard for screening and classifying subsequent classifiers, namely training of face recognition models is completed, and x is defined as follows_α、x_βAnd x_γThe array is equally divided into expression formulas of L +1 intervals.

S150, according to the value of the face characteristic value to be recognized on the main characteristic dimension, screening the face characteristic vectors of the face library through an attention cascade frame, and screening the closest face characteristic vectors for comparison.

In this embodiment, a main feature dimension and a decision interval distribution table thereof are obtained according to a face sample set training, based on the value of the face to be recognized output in the step S140 on the main feature dimension and the interval where the value is located, the face representation vectors in the face library are screened through the attention cascade framework, the face representation vectors outside the corresponding interval are sequentially removed, the face representation vectors in the corresponding interval are retained, the face representation vectors retained after the face library is screened this time are used as the basis of the next screening until all the value screening of all the main feature dimensions is completed, the size of the face library to be recognized can be rapidly reduced through the method, the accuracy of recognition is guaranteed, the face recognition time can be greatly prolonged, and the performance requirement on the recognition server is reduced.

Referring to FIG. 4, in one embodiment, step S150 includes steps S151-S15

And S151, acquiring a preset main feature dimension screening sequence, and sequentially acquiring intervals where the face features to be recognized take values in the corresponding main feature dimensions according to the main feature dimension screening sequence.

S152, preserving the face characteristic vectors in the face library positioned in the corresponding interval and two adjacent intervals, simultaneously removing the rest face characteristic vectors in the face library, and taking the face characteristic vectors preserved after the face library is screened this time as the basis of the next screening until all the values of all the main characteristic dimensions are screened.

In this embodiment, an attention cascade frame is arranged in the face recognition model 200, and the attention cascade frame can rapidly filter all face characterization vectors in the face library sequentially according to the filtering standards of different classifiers. As shown in fig. 7, the attention cascade framework of this embodiment performs fast screening on a face library based on classifier1, classifier2 and classifier3, and the specific screening process is as follows:

1. screening the face library by using classifier 1: according to the decision interval allocation table, first look at x_kαIn which interval the α -dimensional vector of the face representation vector in the database is kept in x_kαPeople in two intervals before and after the interval are abandoned, and other L-2 intervals of people are abandoned.

2. Screening the remaining persons in the library after screening by using classifier1 by using classifier2, and firstly checking x according to a judgment interval allocation table_kβIn which interval, the people in two intervals before and after the interval of the β -dimensional vector of the face representation vector are reserved, and other people in L-2 intervals are discarded.

3. Screening the remaining people after screening classifie2 by using classifie 3, and firstly checking x according to a judgment interval allocation table_kγAnd (4) keeping the personnel of the gamma-dimensional vector of the face representation vector in two intervals before and after the interval in which the person falls, and discarding the other personnel in L-2 intervals.

After the three-step cascade screening, the face features to be recognized are only required to be compared with the face characterization vectors remained in the library, the face with the highest matching degree is determined, and the face identity is determined. In the embodiment, cosine similarity is adopted to measure similarity, some dimensions which are relatively more human face feature identification are selected from all feature dimensions to serve as main feature dimensions, and the size of the human face library to be identified can be rapidly reduced to 20% -40% of the size of the original library by combining an attention cascade framework, so that the accuracy of identification is guaranteed, and meanwhile, the processing time and the performance requirements on a server can be greatly improved. The interval where the value of the main characteristic dimension is located can be directly checked according to the existing judgment interval distribution table.

And S160, comparing the characteristic value of the face to be recognized with the face characterization vectors screened out for comparison to obtain the face with the highest matching degree, and determining the identity of the face.

In this example, cosine similarity is used to measure similarity, and the similarity calculation process is as follows:

in the embodiment, the size of the face library to be recognized can be rapidly reduced based on the main feature dimension and combined with the attention cascade framework. After the face library is reduced, the face characteristic value to be recognized is compared with the face characteristic vector screened for comparison, and the closest personnel in the library are searched.

The large-scale face library recognition method effectively extracts the face characteristic value of the processed face picture through the convolutional neural network characteristic extraction model 100 with an improved training algorithm, so that the extracted face characteristic value can be directly compared with the face characteristic vector in the face library, the similarity is judged, and the identity of a person corresponding to the face picture is determined; in addition, before the face characteristic value to be recognized is compared with the face characteristic vector in the face library, based on the value of the face characteristic value to be recognized in the main characteristic dimension, the size of the face library which can be used for comparison can be rapidly reduced by combining the attention cascade framework, the face recognition efficiency can be greatly improved while the recognition accuracy is guaranteed, and the performance requirement on the recognition server is reduced.

Fig. 5 is a schematic flow chart of a large-scale face library recognition method according to another embodiment of the present invention. As shown in fig. 5, the large-scale face library recognition method of the present embodiment includes steps S210 to S270. Steps S210 to S260 are similar to steps S110 to S160 in the above embodiments, and are not described herein again. The added step S270 in the present embodiment is explained in detail below.

And S270, labeling the successfully matched face picture, updating the successfully matched face picture into a face library, and correcting the face representation vector of the corresponding person according to the newly added face picture.

In this embodiment, after a certain period of practical application, the face acquisition data of the same person will be enriched, and it is expected that the representation of a person in the face library will be more accurate in business, and the face acquisition system can adapt to various kinds of camera acquisition angles and light rays to a certain extent. Therefore, the scheme labels the face picture newly determined as a certain person and updates the face picture into the face library, and corrects the face representation vector of the corresponding person according to the newly added face picture, that is, the face representation vector of the representative person j is continuously corrected as follows:

wherein i is the characteristic value extracted from the ith human face picture of the person j, and M is the number of effective human face pictures of the person j.

Fig. 8 is a schematic block diagram of a large-scale face library recognition system according to an embodiment of the present invention. As shown in fig. 8, the present invention also provides a large-scale face library recognition system corresponding to the above large-scale face library recognition method. The large-scale face library recognition system includes a unit for executing the above-mentioned large-scale face library recognition method, please refer to fig. 8, and the large-scale face library recognition system includes a video acquisition unit 10, a preprocessing unit 20, a feature extraction unit 30, a main feature extraction unit 40, a face library screening unit 50, a feature comparison unit 60, and a characterization vector updating unit 70. The video acquisition unit 10, the preprocessing unit 20, the feature extraction unit 30 and the main feature extraction unit 40 can be configured in an intelligent terminal with a camera function; the face library screening unit 50, the feature comparing unit 60 and the feature vector updating unit 70 may be configured in a desktop computer or a server.

And the video acquisition unit 10 is used for acquiring the acquired video stream of the target area.

The preprocessing unit 20 is configured to preprocess the video stream to obtain a preprocessed face picture.

Referring to fig. 9, in an embodiment, the preprocessing unit 20 includes a frame extraction module 21, a detection deduction module 22, a face tracking module 23, a de-reselection optimization module 24, a gray level transformation module 25, a face alignment module 26, and a geometric normalization module 27.

And the frame extraction module 21 is configured to extract image frames of the video stream at set time intervals.

And the detection deduction module 22 is used for detecting the face of the image frame, marking the coordinates of key points of the face, the relative position and the size in the face picture in the image frame, and intercepting a face picture from the face frame.

And the face tracking module 23 is configured to extract motion information of the pixel points according to the optical flow information of each frame of image frame, label and track the same face in consecutive frames.

And the duplication elimination optimization module 24 is used for selecting the picture with the best quality in the same labeled face picture within a period of time.

And the gray level conversion module 25 is used for performing gray level conversion on the face picture with the best quality and adjusting the gray level distribution of the face picture.

And the face alignment module 26 is configured to identify face key points on the face picture after the gray level conversion, and adjust the face picture based on the face key points so that the face is located at the center of the face picture and both eyes of the face are located at horizontal positions.

And the geometric normalization module 27 is configured to adjust the size of the face picture to a target size, so as to obtain a preprocessed face picture.

And the feature extraction unit 30 is configured to extract a global or local target face feature value of the face picture through the feature extraction model 100.

Fig. 5 shows a training architecture diagram of the feature extraction model 100, in this embodiment, the following improvements are made to the selection and training process of the linear layer parameters of the feature extraction model 100:

wherein the linear layer parameter u_jRepresenting face characterization vectors corresponding to jth class of face samples, j is a natural number greater than 0, K is the dimension number of the face characterization vectors, M is the number of face pictures of the jth class of face samples in the training samples, and x_jiI-th personal face feature vector, R, representing class j sample^KIs a real vector of dimension K.

In the present embodiment, the linear layer parameter u of the feature extraction model 100 is set_jThe principal feature extraction model 100 may be better based on the personnel in the libraryAnd extracting the face characterization vectors, and using the extracted face characterization vectors for face recognition of the personnel in the library.

In addition, the training strategy adopted in the training process of the feature extraction model 100 is different from the existing training strategy, and besides Cross-Engine Loss used in combination with the softmax probability distribution function, CenterLoss is also introduced. Direct face characteristic value x for Central Loss_i(x_i∈R^KK is 128 dimension in this example) as input, which calculates the dispersion (variance) of the feature vectors of each sample in this category to ensure that the feature vectors extracted from different sample pictures of the same kind (the same registrant) have higher polymerization degree. Wherein, the calculation formula of Central Loss is as follows:

And the main feature extraction unit 40 is configured to obtain values of the face feature values to be recognized in all main feature dimensions.

In this embodiment, all M face pictures of the person j in the face sample set can be substituted into the trained feature extraction model 100, and the ith face picture of the person j is taken as an example, and its feature vector is x_jiCalculating a face characterization vector u of the person j based on the feature vectors of the M pictures_j。

In particular, u_jThe calculation formula of (a) is as follows:

from face representation vector u of person j_jIn the l dimensionThe value of above is u_ji。

a50, obtaining S target dimensions in the variance array from large to small, wherein S is a natural number more than or equal to 2;

a60, extracting target dimension characteristic vectors of the face sample set on corresponding target dimensions to form a target dimension array;

and A70, according to the value range of the target dimension array, uniformly dividing L +1 intervals to obtain an interval distribution table, and taking the interval distribution table as a standard for judging and screening the attention cascade framework.

In this embodiment, based on the obtained variance array, S dimensions with the largest variance value are selected from large to small as target dimensions, where S is a natural number greater than or equal to 2, and the size of S may be set as required, and may be 2, 3, 4, and the like.

Specifically, the embodiment selects three dimensions with the largest variance values from large to smallThe calculation process of the three Classifier judgment conditions comprises the steps of extracting D parts of input human face sample set picture data through a feature extraction model 100, inputting the D parts of input human face sample set picture data into the main feature extraction model 100 to obtain a feature vector array x on a gamma-bit main feature dimension of α_α、x_βAnd x_γ。

x_α＝[x_1αx_2α...x_Dα]；

x_β＝[x_1βx_2β... x_Dβ]；

x_γ＝[x_1γx_2γ... x_Dγ]。

And the face library screening unit 50 is configured to screen face characterization vectors of the face library through an attention cascade frame according to values of the face feature values to be recognized in the main feature dimension, and screen out the closest face characterization vectors for comparison.

Referring to fig. 10, the face library filtering unit 50 includes an interval acquisition module 51 and a vector filtering module 52.

The interval obtaining module 51 is configured to obtain a preset main feature dimension screening sequence, and sequentially obtain an interval where the face features to be recognized take values in the corresponding main feature dimension according to the main feature dimension screening sequence.

And the vector screening module 52 is configured to reserve the face characteristic vectors located in the corresponding interval and two adjacent intervals before and after the corresponding interval in the face library, simultaneously remove the rest face characteristic vectors in the face library, and use the face characteristic vectors reserved after the face library is screened this time as a basis for next screening until all values of all main feature dimensions are screened.

1. screening the face library by using classifier 1: according to the decision interval allocation table, first look at x_kαIn which interval the α -dimensional vector of the face representation vector in the database is kept in x_kαThe people in the two sections before and after the section where the person is located will be othersThe L-2 intervals of personnel were discarded.

After the three-step cascade screening, only the target face characteristic value to be recognized needs to be compared with the remaining face characteristic vectors in the library, the closest personnel in the library are searched, cosine similarity is adopted to measure similarity in the embodiment, certain dimensions relatively more having face characteristic identification are selected from all characteristic dimensions to serve as target dimensions, and the size of the face library to be recognized can be rapidly reduced to 20% -40% of the size of the original library by combining an attention cascade framework. The interval where the value of the target dimension is located can be directly checked according to the existing judgment interval distribution table.

And the feature comparison unit 60 is configured to compare the feature value of the face to be recognized with the face characterization vectors screened for comparison, obtain a face with the highest matching degree, and determine the identity of the face.

And the characterization vector updating unit 70 is configured to label the successfully matched face picture and update the labeled face picture into the face library, and modify the face characterization vector of the corresponding person according to the newly added face picture.

In a specific embodiment, a specific application framework of the large-scale face library recognition system of the present invention is shown in fig. 11, wherein the video acquisition unit 10, the preprocessing unit 20, and the feature extraction unit 30 are integrated in a face recognition terminal, and the main feature extraction unit 40, the face library screening unit 50, the feature comparison unit 60, and the feature vector update unit 70 are integrated in a local recognition server, and the acquisition and preprocessing of a face picture and the extraction of a target face feature value are performed at the face recognition terminal, and the acquisition of a target dimension feature value, the screening of a face library, and the comparison of a target face feature are performed at the local recognition server, so as to implement face identity recognition. In another embodiment, the face recognition terminal is provided with only the video acquisition unit 10, and the local recognition server is provided with the preprocessing unit 20, the feature extraction unit 30, the main feature extraction unit 40, the face screening unit 50, the feature comparison unit 60 and the feature vector updating unit 70, which are integrated, and feature extraction, main feature acquisition and face feature recognition are performed in the local recognition server.

It should be noted that, as can be clearly understood by those skilled in the art, the specific implementation process of the large-scale face library recognition system and each unit may refer to the corresponding description in the foregoing method embodiment, and for convenience and conciseness of description, no further description is provided herein.

Referring to fig. 12, fig. 12 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a terminal or a server, where the terminal may be an electronic device with a communication function, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, and a wearable device. The server may be an independent server or a server cluster composed of a plurality of servers.

Referring to fig. 12, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032 comprises program instructions that, when executed, cause the processor 502 to perform a large-scale face library recognition method.

The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be enabled to execute a large-scale face database recognition method.

The network interface 505 is used for network communication with other devices. Those skilled in the art will appreciate that the configuration shown in fig. 12 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation of the computer device 500 to which the present application may be applied, and that a particular computer device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

Wherein the processor 502 is adapted to run a computer program 5032 stored in the memory.

It should be understood that, in the embodiment of the present Application, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program includes program instructions, and the computer program may be stored in a storage medium, which is a computer-readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium.

The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed system and method can be implemented in other ways. For example, the system embodiments described above are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.

The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the system of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A large-scale face library recognition method is characterized by comprising the following steps:

acquiring a video stream of a collected target area;

preprocessing a video stream to obtain a preprocessed face picture;

2. The method for large-scale face library recognition according to claim 1, wherein the step of preprocessing the video stream to obtain a preprocessed face picture comprises;

extracting image frames of a video stream according to a set time interval;

detecting a face of an image frame, marking face key point coordinates and relative positions and sizes in a face picture in the image frame, and intercepting a face picture from a face frame;

extracting motion information of pixel points according to the optical flow information of each frame of image frame, and labeling and tracking the same face in continuous frames;

selecting a picture with the best quality in the same labeled face pictures within a period of time;

carrying out gray level transformation on the face picture with the best quality, and adjusting the gray level distribution of the face picture;

identifying face key points on the face picture after gray level conversion, and adjusting the face picture based on the face key points to enable the face to be located in the center of the face picture, and enable two eyes of the face to be located at horizontal positions;

and adjusting the size of the face picture to a target size to obtain the preprocessed face picture.

3. The large-scale face library recognition method according to claim 1, wherein the step of screening face characterization vectors of the face library through an attention cascade frame according to values of the face characteristic values to be recognized in the main characteristic dimension, and screening the closest face characterization vectors for comparison comprises;

acquiring a preset main feature dimension screening sequence, and sequentially acquiring intervals where the face features to be recognized are valued on the corresponding main feature dimension according to the main feature dimension screening sequence;

and preserving the face characteristic vectors positioned in the corresponding interval and two adjacent intervals in the face library, simultaneously removing the rest face characteristic vectors in the face library, and taking the face characteristic vectors preserved after the face library is screened as the basis of the next screening until all values of all main characteristic dimensions are screened.

4. The large-scale face library recognition method according to claim 3, wherein the step of sequentially obtaining the intervals where the values of the face features to be recognized are taken on the corresponding main feature dimensions according to the main feature dimension screening sequence comprises;

and checking the interval of the value of the main characteristic dimension of the face characteristic value to be recognized according to the judgment interval distribution table.

5. The large-scale face library recognition method according to claim 4, wherein the main feature dimension is determined by analyzing sample distribution dispersion on each feature dimension after feature extraction is performed on different face pictures of a large number of people serving as sample sets through a convolutional neural network, and the method comprises the following steps:

inputting a face sample set, and acquiring feature vectors of different face pictures of all people;

calculating to obtain the face characteristic vector u of the same person according to the feature vectors of different face pictures_j；

According to the face characterization vector u_jAcquiring different dimension vector arrays of all the personnel;

calculating to obtain a variance array of all the personnel in the K dimension according to the different dimension vector arrays of all the personnel;

acquiring at least 3 main feature dimensions in the variance array from large to small;

extracting feature vectors of main feature dimensions of the face sample set on the corresponding main feature dimensions;

and dividing the value range of the face sample set on each main characteristic dimension into L +1 intervals to obtain an interval distribution table.

6. The large-scale face library recognition method according to claim 1, wherein the step of extracting the global or local face feature value of the face picture as the face feature value to be recognized is preceded by;

setting linear layer parameters u of feature extraction model_jThe calculation formula of (2) is as follows:

sequentially training linear layer parameters u of different types of face samples through EM algorithm_jAnd updating the linear layer parameter u of the corresponding class sample after each training batch is completed_jA parameter value of (d);

whereinLinear layer parameter u_jRepresenting face characterization vectors corresponding to jth class of face samples, j is a natural number greater than 0, K is the dimension number of the face characterization vectors, M is the number of face pictures of the jth class of face samples in the training samples, and x_jiAn ith personal face feature vector representing a jth class sample.

7. The large-scale face library recognition method according to claim 1, wherein the step of comparing the feature value of the face to be recognized with the face feature vectors screened for comparison to obtain the face with the highest matching degree and determining the identity of the face further comprises;

and labeling the successfully matched face picture, updating the face picture into a face library, and correcting the face representation vector of the corresponding person according to the newly added face picture.

8. A large scale face library recognition system, comprising:

9. A computer device, characterized in that the computer device comprises a memory and a processor, the memory having stored thereon a computer program, the processor implementing the method of large scale face library recognition according to any one of claims 1 to 7 when executing the computer program.

10. A storage medium storing a computer program which, when executed by a processor, implements the method of large scale face library recognition according to any one of claims 1 to 7.