WO2021012526A1

WO2021012526A1 - Face recognition model training method, face recognition method and apparatus, device, and storage medium

Info

Publication number: WO2021012526A1
Application number: PCT/CN2019/118461
Authority: WO
Inventors: 姚旭峰
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-07-22
Filing date: 2019-11-14
Publication date: 2021-01-28
Also published as: CN110543815A; CN110543815B

Abstract

A face recognition model training method, a face recognition method and apparatus, a device, and a storage medium. The method comprises: training a preset convolutional neural network; establishing connection between a feature extraction network and a classification network; freezing a weight parameter of the feature extraction network; performing iterative training on the classification network; unfreezing a weight parameter of a feature extraction network of a second convolutional neural network model; and training the unfrozen second convolutional neural network model to obtain a face recognition model.

Description

Training method, face recognition method, device, equipment and storage medium of face recognition model

This application requires the priority of a Chinese patent application submitted to the Chinese Patent Office on July 22, 2019, with the application number 201910663230.9 and the invention title "Face Recognition Model Training Method, Face Recognition Method, Device, Equipment and Storage Medium" Right, the entire contents of which are incorporated in this application by reference.

Technical field

This application relates to the field of biometrics, and in particular to a method for training a face recognition model, a face recognition method, device, equipment and storage medium.

Background technique

In recent years, biometric detection and recognition represented by human faces have been widely used in many fields such as identity recognition and wisdom education. Face recognition technology refers to the recognition of the position of a face in a picture or a video through a face recognition model. The existing face recognition model mainly adopts the transfer learning method for training to accelerate the training speed. In the migration process, a classification layer is often added after the presentation layer of the network. Because the parameter distributions of the presentation layer and the classification layer are inconsistent, there is a problem that gradient explosions are prone to occur, resulting in poor model stability.

Summary of the invention

This application provides a face recognition model training method, face recognition method, device, equipment and storage medium. The method can increase the speed of face recognition and avoid gradients caused by inconsistent parameter distributions between the feature extraction network and the classification network The problem of explosion improves the stability of the model.

In the first aspect, this application provides a method for training a face recognition model, the method including:

According to the image information of the first sample, train a preset convolutional neural network to construct a feature extraction network;

Establishing a connection between the feature extraction network and a preset classification network to obtain a first convolutional neural network model;

Freezing the weight parameters of the feature extraction network of the first convolutional neural network model;

Performing iterative training on the classification network in the first convolutional neural network model according to the second sample image information to obtain a second convolutional neural network model;

Unfreeze the weight parameters of the feature extraction network of the second convolutional neural network model;

According to the third sample image information, train the thawed second convolutional neural network model to obtain the face recognition model.

In the second aspect, this application also provides a face recognition method, which includes:

Obtain the image to be recognized;

Inputting the to-be-recognized image into a preset face recognition model to obtain a face recognition result, the face recognition model being trained by the above-mentioned face recognition model training method;

If the face recognition result indicates that the image to be recognized is successfully recognized, first prompt information for prompting the user to successfully recognize the image to be recognized is displayed.

In a third aspect, the present application also provides a training device for a face recognition model, the training device includes:

The feature training unit is used to train a preset convolutional neural network according to the first sample image to construct a feature extraction network;

A network connection unit, configured to establish a connection between the feature extraction network and a preset classification network to obtain a first convolutional neural network model;

A parameter freezing unit for freezing the weight parameters of the feature extraction network of the first convolutional neural network model;

The classification training unit is configured to iteratively train the classification network in the first convolutional neural network model according to the second sample image to adjust the weight parameters of the classification network in the first convolutional neural network model , Thereby obtaining the second convolutional neural network model;

A network unfreezing unit, configured to unfreeze the weight parameters of the feature extraction network of the second convolutional neural network model;

The model training unit is used to train the thawed second convolutional neural network model according to the third sample image to obtain the face recognition model.

In a fourth aspect, the present application also provides a face recognition device, which includes:

The image recognition unit is used to obtain the image to be recognized;

The image input unit is configured to input the image to be recognized into a preset face recognition model to obtain a face recognition result, and the face recognition model is determined by the face recognition according to any one of claims 1-5 Trained by the training method of the model;

The first information display unit is configured to, if the face recognition result indicates that the image to be recognized is successfully recognized, display first prompt information for prompting the user to successfully recognize the image to be recognized.

In a fifth aspect, the present application also provides a computer device, the computer device includes a memory and a processor; the memory is used to store a computer program; the processor is used to execute the computer program and execute the The computer program implements the above-mentioned face recognition model training method.

In a sixth aspect, the present application also provides a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, the processor realizes the aforementioned face recognition The training method of the model.

This application discloses a method, device, device and storage medium for training a face recognition model. A preset convolutional neural network is trained according to the first sample image information to construct a feature extraction network; The feature extraction network establishes a connection with the preset classification network to obtain the first convolutional neural network model; freezes the weight parameters of the feature extraction network of the first convolutional neural network model; according to the second sample image information, the The classification network in the first convolutional neural network model performs iterative training to obtain the second convolutional neural network model; unfreezes the weight parameters of the feature extraction network of the second convolutional neural network model; according to the third sample image information, Training the thawed second convolutional neural network model to obtain the face recognition model. This not only greatly improves the speed of face recognition, reduces the training time, and the resultant face recognition model has a high recognition accuracy, but also avoids the problem of gradient explosion caused by the inconsistency of the parameter distribution between the feature extraction network and the classification network. The stability of the model.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can obtain other drawings based on these drawings without creative work.

FIG. 1 is a schematic flowchart of a method for training a face recognition model provided by an embodiment of the present application;

2 is a schematic flowchart of a method for training a face recognition model provided by another embodiment of the present application;

FIG. 3 is a schematic flowchart of sub-steps of a method for training a face recognition model provided by an embodiment in FIG. 2;

4 is a schematic flowchart of the sub-steps of the method for training a face recognition model provided in another embodiment of FIG. 2;

5 is a schematic flowchart of sub-steps of the training method of the face recognition model in FIG. 2;

Fig. 6 is a schematic flowchart of a method for training a face recognition model provided by still another embodiment of the present application;

FIG. 7 is a schematic flowchart of a face recognition method provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of an application scenario of a face recognition method provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of an application scenario of a face recognition method provided by another embodiment of the present application;

FIG. 10 is a schematic block diagram of a training device for a face recognition model provided by an embodiment of the present application;

FIG. 11 is a schematic block diagram of an apparatus for training a face recognition model provided by another embodiment of the present application;

FIG. 12 is a schematic block diagram of a subunit of a training device for a face recognition model provided by an embodiment of the present application;

FIG. 13 is a schematic block diagram of a face recognition device provided by an embodiment of the present application;

FIG. 14 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.

Detailed ways

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

The flowchart shown in the drawings is merely an illustration, and does not necessarily include all contents and operations/steps, nor does it have to be executed in the described order. For example, some operations/steps can also be decomposed, combined or partially combined, so the actual execution order may be changed according to actual conditions.

The embodiments of the present application provide a method for training a face recognition model, a face recognition method, device, equipment, and storage medium. The training method of the face recognition model can be used to train the face recognition model, which can increase the speed of face recognition and avoid the problem of gradient explosion caused by the inconsistent parameter distribution of the face recognition model, thereby improving the stability of the face recognition model.

Hereinafter, some embodiments of the present application will be described in detail with reference to the accompanying drawings. In the case of no conflict, the following embodiments and features in the embodiments can be combined with each other.

Please refer to FIG. 1. FIG. 1 is a schematic flowchart of steps of a method for training a face recognition model according to an embodiment of the present application. The face recognition model training method is used to train the face recognition model to avoid the problem of gradient explosion caused by the inconsistent parameter distribution of the face recognition model, thereby improving the stability of the model.

As shown in FIG. 1, the training method of the face recognition model specifically includes: step S110 to step S160.

S110. Training a preset convolutional neural network according to the first sample image to construct a feature extraction network.

Specifically, the first sample image is an image collected in advance. The pre-collected image can be a directly collected image or an image obtained from a video. The position of the face in the first sample image is marked as the first true label.

Among them, the feature extraction network is used to extract image features from the image of the input feature extraction network. The feature extraction network can include a number of convolutional layers. Of course, the pooling layer may or may not be included. After the image is input to the feature extraction network, each convolutional layer in the feature extraction network performs convolution processing on the input image layer by layer, and the last convolution layer in the feature extraction network outputs the image features of the input image.

Exemplarily, the feature extraction network includes five convolutional layers, the first convolutional layer conv1 includes 96 11×11 convolution kernels, the second convolutional layer conv2 includes 256 5×5 convolution kernels, and the third The convolutional layer conv3 and the fourth convolutional layer conv4 both include 384 3×3 convolution kernels, and the fifth convolutional layer conv5 includes 256 3×3 convolution kernels. Among them, the first convolutional layer and the second convolutional layer The convolutional layer and the fifth convolutional layer are connected to a 2×2 pooling layer, and each layer is connected to a modified linear unit.

Exemplarily, a pre-trained model such as YOLO9000 can be used as the preset convolutional neural network.

As shown in FIG. 2, in one embodiment, before training a preset convolutional neural network based on the first sample image information to construct a feature extraction network, the method further includes:

S101. Obtain a sample video, and determine a sample image set in the sample video, where the sample image set includes first sample image information, second sample image information, and third sample image information.

Specifically, a camera can be used to collect sample video on the target task. After the camera collects the sample video, the terminal or the server can obtain the sample video and determine the sample image set in the sample video.

Exemplarily, the sample image set may be divided into at least three subsets, which are a first image subset, a second image subset, and a third image subset. The first image subset is a set of first sample image information. The second image subset is a set of second sample image information. The third image subset is a set of third sample image information.

As shown in FIG. 3, in an embodiment, the determining the sample image set in the sample video includes:

S1011a. Perform framing processing on the sample video to obtain several single-frame images.

Specifically, the sample video is composed of successive pictures, and each picture is a frame.

S1011b: If there is a face image in the single frame image, perform wavelet threshold denoising processing on the single frame image.

Specifically, denoising processing is performed on a single frame image with a face image to effectively remove noise and reduce the influence of noise generated by the imaging device and the external environment, thereby improving the quality of the sample image set.

In an embodiment, before step S1011b, the method further includes: judging whether there is a face image in each single frame image. Specifically, the judging whether there is a face image in each single frame image specifically includes: detecting whether there is a position of a key part of the face in each single frame image; if there is a preset in each single frame image For key parts of the face, it is determined that there is a face image in each of the single frame images; if there is no preset key part of the face in each of the single frame images, it is determined that there is no face image in each of the single frame images.

S1011c. If there is no face image in the single frame image, remove the single frame image.

Specifically, a single frame image without a face image is removed to ensure that all the sample images in the sample image set have face images, thereby improving the effectiveness of the sample image set.

In another embodiment, the first sample image information may be an original image directly collected by an image collection device such as a camera. As shown in FIG. 4, in this embodiment, training a preset convolutional neural network according to the first sample image information to construct a feature extraction network specifically includes:

S1012a. Acquire first original image information, second original image information, and third original image information.

Specifically, the first original image information, the second original image information, and the third original image information are images directly collected in advance, and may also be images obtained in advance from a video.

S1012b: If there is a face image in the first original image information, perform wavelet threshold denoising processing on the first original image information to obtain first sample image information.

Specifically, denoising processing is performed on the first original image information with the face image to effectively remove noise and reduce the influence of noise generated by the imaging device and the external environment, thereby improving the quality of the first sample image information.

In an embodiment, before step S1012b, the method further includes: determining whether there is a face image in the first original image information. Specifically, the judging whether there is a face image in the first original image information specifically includes: detecting whether there is a position of a key part of the face in the first original image information; if each of the first original image information exists Preset key parts of the face, and determine that there is a face image in the first original image information; if the preset key parts of the face do not exist in the first original image information, it is determined that there is no face image in the first original image information. If there is no face image in the first original image information, remove the first original image information to ensure that the first sample image information has a face image, thereby improving the validity of the first sample image information .

S1012c: If there is a face image in the second original image information, perform wavelet threshold denoising processing on the second original image information to obtain second sample image information.

Specifically, denoising processing is performed on the second original image information with the face image to effectively remove noise and reduce the influence of noise generated by the imaging device and the external environment, thereby improving the quality of the second sample image information.

In an embodiment, before step S1012c, the method further includes: determining whether there is a face image in the second original image information. Specifically, the judging whether there is a face image in the second original image information specifically includes: detecting whether there is a position of a key part of the face in the second original image information; if each of the second original image information exists Preset key parts of the face, and determine that there is a face image in the second original image information; if the preset key parts of the face do not exist in the second original image information, determine that there is no face image in the second original image information. If there is no face image in the second original image information, the second original image information is removed to ensure that the second sample image information has a face image, thereby improving the effectiveness of the second sample image information.

S1012d: If there is a face image in the third original image information, perform wavelet threshold denoising processing on the third original image information to obtain third sample image information.

Specifically, denoising processing is performed on the third original image information with the face image to effectively remove noise and reduce the influence of noise generated by the imaging device and the external environment, thereby improving the quality of the third sample image information.

In an embodiment, before step S1012d, the method further includes: determining whether there is a face image in the third original image information. Specifically, the judging whether there is a face image in the third original image information specifically includes: detecting whether there is a position of a key part of the face in the third original image information; if each of the third original image information exists Presetting the key parts of the face, and determining that there is a face image in the third original image information; if the preset key parts of the face do not exist in the third original image information, it is determined that there is no face image in the third original image information. If there is no face image in the third original image information, remove the third original image information to ensure that all the third sample image information has a face image, thereby improving the effectiveness of the third sample image information.

S120. Establish a connection between the feature extraction network and a preset classification network to obtain a first convolutional neural network model.

Specifically, a preset classification network is added after the feature extraction network, and the output of the feature extraction network is used as the input of the classification network, so that the feature extraction network establishes a connection with the classification network to obtain The first convolutional neural network model.

Exemplarily, the classification network includes a convolutional layer, a fully connected layer, and a classifier that are sequentially connected. As shown in FIG. 5, the step S120 to establish a connection between the feature extraction network and the preset classification network specifically includes sub-step S121, sub-step S122, and sub-step S123.

Sub-step S121: input the output of the feature extraction network to the convolutional layer.

Specifically, after the feature extraction network processes the input image, the output of the feature extraction network can be input to the convolutional layer of the classification network.

Sub-step S122: input the output of the convolutional layer to the fully connected layer, so as to reduce the dimensionality of the output of the convolutional layer.

Specifically, the output of the convolutional layer of the classification network is input to the fully connected layer of the classification network, so that the dimensionality of the output of the convolutional layer is reduced.

Wherein, the inputting the output of the convolutional layer to the fully connected layer to reduce the dimensionality of the output of the convolutional layer includes:

Based on the weight calculation formula, a fully connected layer operation is performed on each feature value of the output of the convolution layer to reduce the dimensionality of the output of the convolution layer; the weight calculation formula is:

Among them, the loss function is the mean square error MSE function, W represents the weight of the convolutional layer, W _i represents the ith weight of the convolutional layer, h represents the bias of the convolutional layer, and h _i represents the ith bias of the convolutional layer, X represents the entire sample image set, X(i) represents the first true label corresponding to the i-th sample image;

Represents the output of the output layer after the i-th sample image is input to the classification network, and η represents the learning efficiency of the back propagation algorithm.

In this embodiment, based on the above weight calculation formula, a fully connected layer operation is performed on each feature value of the output of the convolutional layer through a backpropagation algorithm, thereby reducing the dimensionality of the output of the convolutional layer.

S123. Use the classifier to classify the output of the fully connected layer to establish a connection between the feature extraction network and the classification network.

As shown in FIG. 6, before establishing a connection between the feature extraction network and a preset classification network to obtain the first convolutional neural network model, the method further includes:

S102. Determine a weight parameter of the feature extraction network.

Specifically, the weight parameter of the feature extraction network is composed of the weight parameters of each layer of the feature extraction network, that is, each layer of the feature extraction network has a weight parameter, and the set of weight parameters of each layer forms the weight parameter of the feature extraction network.

S103: Determine whether the error value between the output of the feature extraction network and the first real tag is less than a first preset threshold.

Specifically, the position of the target sample face in the first sample image is labeled as the first real label. The first preset threshold can be set according to actual needs, for example, set to 0.01.

Wherein, if the error value between the output of the feature extraction network and the first real tag is less than the first preset threshold, step S120 is executed, that is, the feature extraction network is connected to the preset classification network .

If the error value between the output of the feature extraction network and the first real label is greater than or equal to the first preset threshold, return to step S110 and continue to train the preset convolutional neural network until all The error value between the output of the feature extraction network and the first real tag is less than the first preset threshold.

S130: Freeze the weight parameters of the feature extraction network of the first convolutional neural network model.

Specifically, after freezing the weight parameters of the feature extraction network of the first convolutional neural network model, when the image information including the target face is input into the frozen first convolutional neural network model for training, the first The weight parameters of the feature extraction network of the convolutional neural network model will not change accordingly.

S140: Perform iterative training on the classification network in the first convolutional neural network model according to the second sample image to obtain a second convolutional neural network model.

Specifically, the second sample image is a pre-collected image including the target sample face. The pre-collected image can be a directly collected image or an image obtained from a video.

S150. Unfreeze the weight parameter of the feature extraction network of the second convolutional neural network model.

As shown in FIG. 6, in this embodiment, step S150, before the unfreezing the weight parameters of the feature extraction network of the second convolutional neural network model, further includes:

S104: Determine whether the error value between the output of the classification network of the second convolutional neural network model and the second real label is less than a second preset threshold.

Specifically, the target sample face region in the second sample image is labeled as the second real label. The second preset threshold can be set according to actual needs, for example, set to 0.005.

Wherein, if the error value between the output of the classification network of the second convolutional neural network model and the second real label is less than the second preset threshold value, step S150 is executed, that is, the second convolutional neural network is unfreezed The features of the network model extract the weight parameters of the network.

If the error value between the output of the classification network of the second convolutional neural network model and the second real label is greater than or equal to the second preset threshold, return to step S140 and continue to compare the first volume The classification network in the product neural network model is iteratively trained until the error value between the output of the classification network of the second convolutional neural network model and the second true label is less than the second preset threshold.

S160. Training the thawed second convolutional neural network model according to the third sample image to obtain the face recognition model.

Specifically, the third sample image is a pre-collected image including the target sample face. The pre-collected image can be a directly collected image or an image obtained from a video.

Wherein, after the second convolutional neural network is thawed, the feature extraction network and the classification network of the thawed second convolutional neural network are jointly trained according to the third sample image, so as to improve the performance of the second convolutional neural network. The weight parameters of the feature extraction network and the weight parameters of the classification network are jointly adjusted until convergence, and a face recognition model is obtained. More specifically, according to the difference between the output of the second convolutional neural network model and the labeled third real label, continuously fine-tune the weight parameters of the feature extraction training network and modify the weight parameters of the classification network until the second convolutional neural network The difference between the output of the network model and the labeled third real label is less than the third preset threshold, the weight parameters of the feature extraction training network and the weight parameters of the classification network are determined, so as to determine all the parameters of the face recognition model, and obtain the Face recognition model. Specifically, the face area in the third sample image is labeled to obtain the third true label.

The aforementioned method for training a face recognition model is to train a preset convolutional neural network according to the first sample image information to construct a feature extraction network; establish a connection between the feature extraction network and the preset classification network, To obtain the first convolutional neural network model; freeze the weight parameters of the feature extraction network of the first convolutional neural network model; perform the classification network in the first convolutional neural network model according to the second sample image information Iterative training to obtain the second convolutional neural network model; unfreeze the weight parameters of the feature extraction network of the second convolutional neural network model; perform processing on the thawed second convolutional neural network model according to the third sample image information Training to obtain the face recognition model. This not only greatly improves the speed of face recognition, reduces the training time, and the resultant face recognition model has a high recognition accuracy, but also avoids the problem of gradient explosion caused by the inconsistency of the parameter distribution between the feature extraction network and the classification network. The weight parameter update of the face recognition model is smoother during training, so that the face recognition model is more robust. At the same time, it is easy to reach the optimal value in the process of backpropagation update parameters, which improves the stability of the model.

Please refer to FIG. 8, which is a schematic flowchart of steps of a face recognition method according to an embodiment of the present application. Please refer to FIG. 9, which is a schematic diagram of an application scenario of a face recognition method provided by an embodiment of the present application. Among them, the face recognition method can be applied to a system including

terminal devices

310 and 320, network 330 and server 340.

The network 340 is used to provide a medium of communication links between the

terminal devices

310 and 320 and the server 340. The network 330 may include various connection types, such as wired, wireless communication links, or fiber optic cables.

The user can use the

terminal devices

310 and 320 to interact with the server 340 via the network 330 to receive or send request instructions and the like. Various communication client applications, such as image processing applications, web browser applications, search applications, instant messaging tools, etc., may be installed on the

terminal devices

310 and 320.

The

terminal devices

310 and 320 may be various electronic devices with a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and so on.

The server 340 may be a server that provides various services, for example, a background management server that provides support for teaching websites browsed by users using the

terminal devices

310 and 320. The background management server can analyze and process the received product information query request and other data, and feed back the processing result to the

terminal devices

310 and 320.

As shown in FIG. 8, the face recognition method specifically includes: step 210 to step 230.

S210: Acquire an image to be recognized.

Specifically, the image to be recognized includes a face target to be recognized, which may be a visible light image, such as an image in an RGB (Red Green Blue) mode. Of course, the aforementioned image to be recognized may also be a near infrared (Near Infrared, NIR) image.

The execution subject of this embodiment may be installed with a camera for collecting visible light images and a camera for collecting near-infrared images. The user can select the camera to be turned on, and then use the selected camera to take a picture (using a self-portrait of the user's head or face) to obtain the image to be recognized.

S220. Input the image to be recognized into a preset face recognition model to obtain a face recognition result.

After the image to be recognized is obtained, the image to be recognized can be input to a pre-trained face recognition model to obtain a face recognition result. Wherein, the preset face recognition model is a face recognition model obtained by training using the training method of the aforementioned face recognition model.

S230: If the face recognition result indicates that the image to be recognized is successfully recognized, display first prompt information for prompting the user to successfully recognize the image to be recognized.

Specifically, if the terminal device determines that the face recognition result indicates that the image to be recognized is successfully recognized, the terminal device may display first prompt information for prompting the user to successfully recognize the image to be recognized. For example, the character string "Recognition passed" is displayed.

As shown in FIG. 9, in order to further improve the accuracy of recognition of the target face in the image to be recognized and the flexibility of face recognition, step S230, if the result of the face recognition indicates that the face to be recognized is successfully recognized The image, after displaying the first prompt information for prompting the user to successfully recognize the image to be recognized, further includes:

S240. If the face recognition result indicates that the image to be recognized cannot be recognized, display second prompt information for prompting the user to reacquire the image to be recognized, so that the image to be recognized is prompted to reacquire the image after the image to be recognized cannot be recognized.

Specifically, if the terminal device determines that the face recognition result indicates that the image to be detected cannot be recognized, the execution subject may display second prompt information for prompting the user to reacquire the image to be recognized. For example, the character string "Please reacquire the image" is displayed.

Exemplarily, the user's characteristic information may be pre-stored in the terminal device, and the pre-stored characteristic information may be extracted from the face image uploaded by the user during registration. The terminal device may use the feature information extracted from the image to be recognized using the aforementioned face recognition model as the face recognition result. If the face recognition result does not match the pre-stored feature information (for example, the similarity is less than a certain predetermined Set the value), then it can be determined that the recognition of the image to be recognized fails.

For example, if the face object in the image to be recognized is blurry, or the angle of the face object in the face image uploaded by the user during registration is quite different, the face recognition model can use the image to be detected. The extracted feature information is quite different from the pre-stored feature information. At this time, the face recognition result may indicate that the image to be recognized cannot be recognized.

Exemplarily, when a user logs in to a certain teaching application platform or teaching website, the aforementioned face recognition method may be used to perform face recognition login. Specifically, the camera of the terminal device can collect the face image of the user to be logged in, and compare the face image of the user to be logged in with the facial images of all users who have registered on the teaching application platform or teaching website to control the user log in. In this example, the face image of the user to be logged in can be used as the image to be recognized. Before recognizing the image to be recognized, the image to be recognized can be preprocessed. The preprocessing process here may include a face image alignment process. The face alignment process mainly includes face detection, face key point positioning, and then the detected face key points in all images are as close as possible to the preset face key point positions, and finally the person is cut out from the image Face area and adjust the resolution of the face area to a predetermined size, such as 224×224. Next, you can perform specific operations on the preprocessed image to be recognized.

The face recognition method described above obtains an image to be recognized; inputs the image to be recognized into a preset face recognition model to obtain a face recognition result; if the face recognition result indicates that the image to be recognized is successfully recognized, Display the first prompt message for prompting the user to successfully recognize the image to be recognized. This method can quickly recognize the face of the image to be recognized, and at the same time has the advantages of high recognition accuracy.

Please refer to FIG. 10. FIG. 10 is a schematic block diagram of an apparatus for training a face recognition model provided by an embodiment of the present application. The training apparatus for a face recognition model may be configured in a server for performing any of the foregoing The training method of face recognition model.

As shown in FIG. 10, the training device 300 for a face recognition model includes:

The feature training unit 310 is configured to train a preset convolutional neural network according to the first sample image to construct a feature extraction network;

The network connection unit 320 is configured to connect the feature extraction network with a preset classification network to obtain a first convolutional neural network model;

The parameter freezing unit 330 is configured to freeze the weight parameters of the feature extraction network of the first convolutional neural network model;

The classification training unit 340 is configured to perform iterative training on the classification network in the first convolutional neural network model according to the second sample image to perform weight parameters of the classification network in the first convolutional neural network model Adjust to obtain the second convolutional neural network model;

The network unfreezing unit 350 is configured to unfreeze the weight parameters of the feature extraction network of the second convolutional neural network model;

The model training unit 360 is configured to train the thawed second convolutional neural network model according to the third sample image to obtain the face recognition model.

In one embodiment, as shown in FIG. 11, the training device 300 for the face recognition model further includes an output judgment unit 370 for judging whether the error value between the output of the feature extraction network and the first real label is smaller than the first true label. A preset threshold.

The network connection unit 320 is specifically configured to establish the feature extraction network and the classification network if the error value between the output of the feature extraction network and the first real label is less than the first preset threshold Connected to obtain the first convolutional neural network model.

As shown in Fig. 12, in one embodiment, the classification network includes a convolutional layer, a fully connected layer, and a classifier that are sequentially connected. The network connection unit 320 includes a convolution input subunit 321, a connection input subunit 322, and a classification processing subunit 323.

The convolution input subunit 321 is configured to input the output of the feature extraction network to the convolution layer;

The connection input subunit 322 is configured to input the output of the convolutional layer to the fully connected layer, so as to reduce the dimensionality of the output of the convolutional layer;

The classification processing subunit 323 is configured to use the classifier to classify the output of the fully connected layer to establish a connection between the feature extraction network and the classification network.

In an embodiment, the connection input subunit 322 is specifically configured to perform a fully connected layer operation on each feature value of the output of the convolutional layer based on a weight calculation formula, so as to perform a calculation on the output of the convolutional layer. Dimensionality reduction.

In an embodiment, the weight calculation formula is:

Please refer to FIG. 13, which is a schematic block diagram of a face recognition device according to an embodiment of the present application, and the face recognition device is used to perform any of the aforementioned methods for training a face recognition model. Wherein, the face recognition device can be configured in a server or a terminal.

As shown in FIG. 13, the face recognition device 400 includes: an image acquisition unit 410, an image input unit 420, and an information display unit 430.

The image acquisition unit 410 is configured to acquire an image to be recognized;

The image input unit 420 is configured to input the image to be recognized into a preset face recognition model to obtain a face recognition result;

The information display unit 430 is configured to display first prompt information for prompting the user to successfully recognize the image to be recognized if the face recognition result indicates that the image to be recognized is successfully recognized.

It should be noted that those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the device and each unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here. Repeat.

The above-mentioned apparatus can be implemented in the form of a computer program, and the computer program can be run on the computer device as shown in FIG. 14.

Please refer to FIG. 14, which is a schematic block diagram of a computer device according to an embodiment of the present application. The computer equipment can be a server or a terminal.

Referring to FIG. 14, the computer device includes a processor, a memory, and a network interface connected through a system bus, where the memory may include a non-volatile storage medium and an internal memory.

The non-volatile storage medium can store an operating system and a computer program. The computer program includes program instructions. When the program instructions are executed, the processor can execute a method for training a face recognition model.

The processor is used to provide computing and control capabilities and support the operation of the entire computer equipment.

The internal memory provides an environment for the operation of the computer program in the non-volatile storage medium. When the computer program is executed by the processor, the processor can execute a method for training a face recognition model.

The network interface is used for network communication, such as sending assigned tasks. Those skilled in the art can understand that the structure shown in FIG. 14 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

It should be understood that the processor may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), and application specific integrated circuits (Application Specific Integrated Circuits). Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. Among them, the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.

Wherein, the processor is used to run a computer program stored in the memory to implement the following steps:

Training a preset convolutional neural network according to the first sample image information to construct a feature extraction network; establishing a connection between the feature extraction network and the preset classification network to obtain a first convolutional neural network model; Freeze the weight parameters of the feature extraction network of the first convolutional neural network model; according to the second sample image information, perform iterative training on the classification network in the first convolutional neural network model to obtain the second convolutional neural network Network model; unfreeze the weight parameters of the feature extraction network of the second convolutional neural network model; according to the third sample image information, train the thawed second convolutional neural network model to obtain the face recognition model .

In one embodiment, before the processor realizes the establishment of the connection between the feature extraction network and the preset classification network to obtain the first convolutional neural network model, the processor is configured to realize:

Determine whether the error value between the output of the feature extraction network and the first real tag is less than a first preset threshold.

When the processor realizes the connection between the feature extraction network and the preset classification network to obtain the first convolutional neural network model, it is used to realize:

If the error value between the output of the feature extraction network and the first real label is less than the first preset threshold, establish a connection between the feature extraction network and the classification network to obtain the first convolution Neural network model.

In an embodiment, the classification network includes a convolutional layer, a fully connected layer, and a classifier that are sequentially connected. When the processor realizes the establishment of the connection between the feature extraction network and the preset classification network, it is used to realize:

Input the output of the feature extraction network to the convolutional layer; input the output of the convolutional layer to the fully connected layer to reduce the dimensionality of the output of the convolutional layer; adopt the classifier The output of the fully connected layer is classified to establish a connection between the feature extraction network and the classification network.

In an embodiment, when the processor implements the input of the output of the convolutional layer to the fully connected layer to reduce the dimensionality of the output of the convolutional layer, the processor is used to implement:

Based on the weight calculation formula, a fully connected layer operation is performed on each feature value of the output of the convolutional layer, so as to reduce the dimensionality of the output of the convolutional layer.

In an embodiment, the weight calculation formula is:

Wherein, in another embodiment, the processor is used to run a computer program stored in the memory to implement the following steps:

Obtain the image to be recognized; input the image to be recognized into a preset face recognition model to obtain a face recognition result; if the face recognition result indicates that the image to be recognized is successfully recognized, a display is used to prompt the user to successfully recognize the image The first prompt information of the image to be recognized.

The embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a computer program, the computer program includes program instructions, and the processor executes the program instructions to implement the present application Any one of face recognition model training methods or face recognition methods provided in the embodiments.

The computer-readable storage medium may be the internal storage unit of the computer device described in the foregoing embodiment, such as the hard disk or memory of the computer device. The computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a smart memory card (SMC), or a secure digital (Secure Digital, SD) equipped on the computer device. ) Card, Flash Card, etc.

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Anyone familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

A method for training a face recognition model, including:

According to the image information of the first sample, train a preset convolutional neural network to construct a feature extraction network;

Judging whether the error value between the output of the feature extraction network and the first real tag is less than a first preset threshold;

If the error value between the output of the feature extraction network and the first real label is less than the first preset threshold, input the output of the feature extraction network to the convolutional layer of the classification network;

Inputting the output of the convolutional layer to the fully connected layer of the classification network to reduce the dimensionality of the output of the convolutional layer;

Use the classifier of the classification network to classify the output of the fully connected layer to establish a connection between the feature extraction network and the classification network, thereby obtaining the first convolutional neural network model;

Freezing the weight parameters of the feature extraction network of the first convolutional neural network model;

Performing iterative training on the classification network in the first convolutional neural network model according to the second sample image information to obtain a second convolutional neural network model;

Unfreeze the weight parameters of the feature extraction network of the second convolutional neural network model;

According to the third sample image information, train the thawed second convolutional neural network model to obtain the face recognition model.
The method for training a face recognition model according to claim 1, wherein the inputting the output of the convolutional layer to the fully connected layer to reduce the dimensionality of the output of the convolutional layer comprises:

Based on the weight calculation formula, a fully connected layer operation is performed on each feature value of the output of the convolutional layer, so as to reduce the dimensionality of the output of the convolutional layer.
The method for training a face recognition model according to claim 2, wherein the weight calculation formula is:

Among them, the loss function is the mean square error MSE function, W represents the weight of the convolutional layer, W i represents the ith weight of the convolutional layer, h represents the bias of the convolutional layer, and h i represents the ith bias of the convolutional layer, X represents the entire sample image set, X(i) represents the first true label corresponding to the i-th sample image;
Represents the output of the output layer after the i-th sample image is input to the classification network, and η represents the learning efficiency of the backpropagation algorithm.
The method for training a face recognition model according to claim 1, wherein before the training a preset convolutional neural network according to the first sample image information to construct a feature extraction network, the method further comprises:

A sample video is acquired, and a sample image set in the sample video is determined. The sample image set includes first sample image information, second sample image information, and third sample image information.
The method for training a face recognition model according to claim 4, wherein said determining the sample image set in the sample video comprises:

Framing the sample video to obtain several single-frame images;

If there is a face image in the single frame image, perform wavelet threshold denoising processing on the single frame image;

If there is no face image in the single frame image, remove the single frame image to obtain the sample image set.
A face recognition method, including:

Obtain the image to be recognized;

The image to be recognized is input into a preset face recognition model to obtain a face recognition result, and the face recognition model is obtained by training the face recognition model training method according to any one of claims 1-5 of;

If the face recognition result indicates that the image to be recognized is successfully recognized, first prompt information for prompting the user to successfully recognize the image to be recognized is displayed.
A training device for a face recognition model includes:

The feature training unit is used to train a preset convolutional neural network according to the first sample image to construct a feature extraction network;

The network connection unit is used to determine whether the error value between the output of the feature extraction network and the first real tag is less than a first preset threshold;

If the error value between the output of the feature extraction network and the first real label is less than the first preset threshold, input the output of the feature extraction network to the convolutional layer of the classification network;

Inputting the output of the convolutional layer to the fully connected layer of the classification network to reduce the dimensionality of the output of the convolutional layer;

Use the classifier of the classification network to classify the output of the fully connected layer to establish a connection between the feature extraction network and the classification network, thereby obtaining the first convolutional neural network model;

A parameter freezing unit for freezing the weight parameters of the feature extraction network of the first convolutional neural network model;

The classification training unit is configured to iteratively train the classification network in the first convolutional neural network model according to the second sample image to adjust the weight parameters of the classification network in the first convolutional neural network model , Thereby obtaining the second convolutional neural network model;

A network unfreezing unit, configured to unfreeze the weight parameters of the feature extraction network of the second convolutional neural network model;

The model training unit is used to train the thawed second convolutional neural network model according to the third sample image to obtain the face recognition model.
A face recognition device, including:

The image recognition unit is used to obtain the image to be recognized;

The image input unit is configured to input the image to be recognized into a preset face recognition model to obtain a face recognition result, and the face recognition model is determined by the face recognition according to any one of claims 1-5 Trained by the training method of the model;

The first information display unit is configured to, if the face recognition result indicates that the image to be recognized is successfully recognized, display first prompt information for prompting the user to successfully recognize the image to be recognized.
A computer device including a memory and a processor;

The memory is used to store computer programs;

The processor is configured to execute the computer program and implement the following steps when executing the computer program:

According to the image information of the first sample, train a preset convolutional neural network to construct a feature extraction network;

Judging whether the error value between the output of the feature extraction network and the first real tag is less than a first preset threshold;

If the error value between the output of the feature extraction network and the first real label is less than the first preset threshold, input the output of the feature extraction network to the convolutional layer of the classification network;

Inputting the output of the convolutional layer to the fully connected layer of the classification network to reduce the dimensionality of the output of the convolutional layer;

Use the classifier of the classification network to classify the output of the fully connected layer to establish a connection between the feature extraction network and the classification network, thereby obtaining the first convolutional neural network model;

Freezing the weight parameters of the feature extraction network of the first convolutional neural network model;

Performing iterative training on the classification network in the first convolutional neural network model according to the second sample image information to obtain a second convolutional neural network model;

Unfreeze the weight parameters of the feature extraction network of the second convolutional neural network model;

According to the third sample image information, train the thawed second convolutional neural network model to obtain the face recognition model.
The computer device according to claim 9, wherein the inputting the output of the convolutional layer to the fully connected layer to reduce the dimensionality of the output of the convolutional layer comprises:

Based on the weight calculation formula, a fully connected layer operation is performed on each feature value of the output of the convolutional layer to reduce the dimensionality of the output of the convolutional layer.
The computer device according to claim 10, wherein the weight calculation formula is:

Among them, the loss function is the mean square error MSE function, W represents the weight of the convolutional layer, W i represents the ith weight of the convolutional layer, h represents the bias of the convolutional layer, and h i represents the ith bias of the convolutional layer, X represents the entire sample image set, X(i) represents the first true label corresponding to the i-th sample image;
Represents the output of the output layer after the i-th sample image is input to the classification network, and η represents the learning efficiency of the back propagation algorithm.
The computer device according to claim 9, wherein before the training a preset convolutional neural network according to the first sample image information to construct a feature extraction network, the method further comprises:

A sample video is acquired, and a sample image set in the sample video is determined. The sample image set includes first sample image information, second sample image information, and third sample image information.
The computer device according to claim 12, wherein said determining the sample image set in the sample video comprises:

Framing processing the sample video to obtain several single-frame images;

If there is a face image in the single frame image, perform wavelet threshold denoising processing on the single frame image;

If there is no face image in the single frame image, remove the single frame image to obtain the sample image set.
A computer device including a memory and a processor;

The memory is used to store computer programs;

The processor is configured to execute the computer program and implement the following steps when executing the computer program:

Obtain the image to be recognized;

The image to be recognized is input into a preset face recognition model to obtain a face recognition result, and the face recognition model is obtained by training the face recognition model training method according to any one of claims 1-5 of;

If the face recognition result indicates that the image to be recognized is successfully recognized, first prompt information for prompting the user to successfully recognize the image to be recognized is displayed.
A computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the processor implements the following steps:

According to the image information of the first sample, train a preset convolutional neural network to construct a feature extraction network;

Judging whether the error value between the output of the feature extraction network and the first real tag is less than a first preset threshold;

If the error value between the output of the feature extraction network and the first real label is less than the first preset threshold, input the output of the feature extraction network to the convolutional layer of the classification network;

Inputting the output of the convolutional layer to the fully connected layer of the classification network to reduce the dimensionality of the output of the convolutional layer;

Classify the output of the fully connected layer by using the classifier of the classification network to establish a connection between the feature extraction network and the classification network, thereby obtaining the first convolutional neural network model;

Freezing the weight parameters of the feature extraction network of the first convolutional neural network model;

Performing iterative training on the classification network in the first convolutional neural network model according to the second sample image information to obtain a second convolutional neural network model;

Unfreeze the weight parameters of the feature extraction network of the second convolutional neural network model;

According to the third sample image information, train the thawed second convolutional neural network model to obtain the face recognition model.
15. The computer-readable storage medium according to claim 15, wherein the inputting the output of the convolutional layer to the fully connected layer to reduce the dimensionality of the output of the convolutional layer comprises:

Based on the weight calculation formula, a fully connected layer operation is performed on each feature value of the output of the convolutional layer to reduce the dimensionality of the output of the convolutional layer.
The computer-readable storage medium according to claim 16, wherein the weight calculation formula is:

Among them, the loss function is the mean square error MSE function, W represents the weight of the convolutional layer, W i represents the ith weight of the convolutional layer, h represents the bias of the convolutional layer, and h i represents the ith bias of the convolutional layer, X represents the entire sample image set, X(i) represents the first true label corresponding to the i-th sample image;
Represents the output of the output layer after the i-th sample image is input to the classification network, and η represents the learning efficiency of the back propagation algorithm.
The computer-readable storage medium according to claim 15, wherein before training a preset convolutional neural network to construct a feature extraction network according to the first sample image information, the method further comprises:

A sample video is acquired, and a sample image set in the sample video is determined. The sample image set includes first sample image information, second sample image information, and third sample image information.
18. The computer-readable storage medium according to claim 18, wherein the determining the sample image set in the sample video comprises:

Framing processing the sample video to obtain several single-frame images;

If there is a face image in the single frame image, perform wavelet threshold denoising processing on the single frame image;

If there is no face image in the single frame image, remove the single frame image to obtain the sample image set.
A computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the processor implements the following steps:

Obtain the image to be recognized;

The image to be recognized is input into a preset face recognition model to obtain a face recognition result, and the face recognition model is obtained by training the face recognition model training method according to any one of claims 1-5 of;

If the face recognition result indicates that the image to be recognized is successfully recognized, first prompt information for prompting the user to successfully recognize the image to be recognized is displayed.