CN111144240A

CN111144240A - Image processing method and related equipment

Info

Publication number: CN111144240A
Application number: CN201911278274.6A
Authority: CN
Inventors: 张阿强
Original assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Current assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2020-05-12
Anticipated expiration: 2039-12-12
Also published as: CN111144240B

Abstract

The embodiment of the application discloses an image processing method and related equipment, wherein the method comprises the following steps: iteratively executing a training process, the training process comprising: adjusting the first face recognition model according to the loss corresponding to the first face recognition model; extracting feature data corresponding to the multiple first sample images from the multiple first sample images obtained in advance through the adjusted first face recognition model; combining the first characteristic data and the second characteristic data to obtain a plurality of first data pairs corresponding to the first characteristic data; combining the first characteristic data and the third characteristic data to obtain a plurality of second data pairs corresponding to the first characteristic data; and determining the loss corresponding to the first face recognition model according to the similarity distance of a plurality of first data pairs corresponding to each first feature data in the feature data of the plurality of first sample images and the similarity distance of a plurality of second data pairs. By adopting the method and the device, the accuracy rate of recognizing the non-standard image can be improved.

Description

Image processing method and related equipment

Technical Field

The present application relates to the field of face recognition technologies, and in particular, to an image processing method and related device.

Background

In recent years, face recognition technology has been rapidly developed, and various algorithms based on data, models, and loss functions have been developed. Many algorithms have very high test results on public data sets, approaching 99%. However, this method distinguishes face images of different persons from different angles and different scenes, and thus it is difficult to efficiently extract common features based on all face images of the same person. Therefore, in practical applications, the difference between the face recognition accuracy and the theory is huge due to the large change of the side face expression, the light change and the like. For example, in a face recognition scene in which only a standard image (for example, the standard image may be a front face image) is registered, recognition by a non-standard image (the non-standard image may be a side face, a face distortion, or the like) causes a problem of a decrease in recognition accuracy.

Disclosure of Invention

The embodiment of the application discloses an image processing method and related equipment, which can improve the accuracy of identification of non-standard images.

In a first aspect, an embodiment of the present application provides an image processing method, including:

iteratively executing a training process of a first face recognition model until the loss corresponding to the first face recognition model meets a first preset condition, wherein the first face recognition model with the loss meeting the first preset condition is used for face recognition;

wherein, the training process of the first face recognition model comprises the following steps:

adjusting the first face recognition model according to the loss corresponding to the first face recognition model;

extracting feature data corresponding to a plurality of first sample images from a plurality of first sample images selected in advance through the adjusted first face recognition model to obtain feature data of the plurality of first sample images, wherein the plurality of first sample images comprise preset standard images and preset non-standard images;

combining first feature data and second feature data to obtain a plurality of first data pairs corresponding to the first feature data, wherein the first feature data is feature data corresponding to a first target standard image in the plurality of first sample images, the second feature data is feature data corresponding to a first target non-standard image in the plurality of first sample images, the first target standard image and the first target non-standard image are sample images of the same person, and the first target standard image is any standard image in the plurality of first sample images;

combining the first feature data and third feature data to obtain a plurality of second data pairs corresponding to the first feature data, wherein the third feature data is feature data corresponding to a first target sample image in the plurality of first sample images, and the first target sample image and the first target standard image are sample images of different persons;

and determining the loss corresponding to the first face recognition model according to the similarity distance of a plurality of first data pairs corresponding to each first feature data in the feature data of the plurality of first sample images and the similarity distance of a plurality of second data pairs.

In the method, the similarity distance between the standard image and the non-standard image of the same person is reduced and the similarity distance between the standard image of the person and the image of other person is increased through model training, so that the same person and different persons can be distinguished through the similarity distance. Because the model training of the application makes the similarity distance between one standard image and one non-standard image of the same person close, the common characteristic between two images of the same person can be effectively found out, the inter-class distance is effectively increased, the inter-class distance is reduced, whether the images are the same person can be more easily distinguished, the accuracy rate of the identification of the non-standard images can be further improved, the problems that the common characteristic of all images of the same person is less, the similarity distance between the images of the same person is larger, and the similarity distance between the images of different persons is smaller are also avoided.

With reference to the first aspect, in one possible implementation manner, the first preset condition is: and the loss corresponding to the first face recognition model is less than the preset loss.

With reference to the first aspect, in a possible implementation manner, the loss L corresponding to the first face recognition model is: L-L1 + (T-L2), where L1 is an average of square losses of a plurality of first data pairs corresponding to each first feature data in the feature data of the plurality of first sample images, L2 is an average of square losses of a plurality of second data pairs corresponding to each first feature data in the feature data of the plurality of first sample images, and T is a preset distance value.

In the embodiment of the application, the loss corresponding to the first face recognition model is calculated according to the average value of the square losses of the first data pair and the second data pair, so that the calculation method of the loss corresponding to the first face recognition model is relatively simple, and the calculation overhead of model training is reduced. In addition, the similarity distance of the second data pair can be quantitatively controlled by setting the preset distance value to be large so as to obviously distinguish the similarity distance of the first data pair, and further, the effect of easily distinguishing the first data pair from the second data pair is achieved.

With reference to the first aspect, in a possible implementation manner, the plurality of first sample images are selected from a first sample image set, and the number of types of the non-standard images in the first sample image set is greater than a preset number; before the iterative execution of the training process of the first face recognition model, the method further includes: iteratively executing a training process of a second face recognition model until the loss corresponding to the second face recognition model meets a second preset condition to obtain the first face recognition model;

wherein the training process of the second face recognition model comprises: adjusting the second face recognition model according to the loss corresponding to the second face recognition model; extracting feature data corresponding to a plurality of second sample images from a plurality of second sample images obtained in advance through the adjusted second face recognition model to obtain feature data of the plurality of second sample images, wherein the plurality of second sample images comprise the standard images and the non-standard images, the plurality of second sample images are selected from a second sample image set, and the number of the second sample images in the second sample image set is greater than a preset number; combining fourth feature data and fifth feature data to obtain a plurality of third data pairs corresponding to the fourth feature data, where the fourth feature data is feature data corresponding to a second target standard image in the plurality of second sample images, the fifth feature data is feature data corresponding to a second target non-standard image in the plurality of second sample images, the second target standard image and the second target non-standard image are sample images of the same person, and the second target standard image is any standard image in the plurality of second sample images; combining the fourth feature data with sixth feature data to obtain a plurality of fourth data pairs corresponding to the fourth feature data, where the sixth feature data is feature data corresponding to a second target sample image in the plurality of second sample images, and the second target sample image and the second target standard image are sample images of different persons; and determining the loss corresponding to the second face recognition model according to the similarity distance of a plurality of third data pairs corresponding to each fourth feature data in the feature data of the plurality of second sample images and the similarity distance of a plurality of fourth data pairs.

In the embodiment of the application, the second face recognition model is subjected to model training based on the second sample image set, a first face recognition model is obtained by utilizing the characteristic of large number of samples of the second sample image set, then the first face recognition model is subjected to model training based on the first sample image set, and the first face recognition model for face recognition is obtained by utilizing the characteristic of more types of non-standard images in the first sample image set.

With reference to the first aspect, in one possible implementation, the standard image is a front face image; before the combining the fourth feature data and the fifth feature data to obtain a plurality of third data pairs corresponding to the fourth feature data, the method further includes: calculating the face angle of a second sample image corresponding to a target person, wherein the target person is any one of all persons represented by the second sample images; and selecting the second sample image corresponding to the target person as the front face image corresponding to the target person with the smallest face angle.

In the embodiment of the application, the front face image of any person can be found from the second sample image set more easily by calculating the face angle.

With reference to the first aspect, in a possible implementation manner, after the iteratively performing the training procedure of the first face recognition model, the method further includes: extracting feature data for verification, which correspond to a plurality of verification images respectively, from a plurality of verification images obtained through pre-selection through the first face recognition model to obtain feature data of the plurality of verification images, wherein the plurality of verification images comprise the standard images and the non-standard images; combining seventh feature data and eighth feature data to obtain a plurality of fifth data pairs corresponding to the seventh feature data, where the seventh feature data is feature data corresponding to a third target standard image in the plurality of verification images, the eighth feature data is feature data corresponding to a third target non-standard image in the plurality of verification images, the third target standard image and the third target non-standard image are sample images of the same person, and the third target standard image is any standard image in the plurality of verification images; combining the seventh feature data with ninth feature data to obtain a plurality of sixth data pairs corresponding to the seventh feature data, where the ninth feature data is feature data corresponding to a third target sample image in the plurality of verification images, and the third target sample image and the third target standard image are sample images of different persons; determining similarity distance threshold values according to similarity distances of a plurality of fifth data pairs and similarity distances of a plurality of sixth data pairs corresponding to each seventh feature data in feature data of the plurality of verification images in a cross-validation mode, wherein the similarity distance threshold is used for calculating a target proportion, the target proportion is a ratio of the number of targets to the total number of a plurality of fifth data pairs and a plurality of sixth data pairs corresponding to each seventh feature data in the feature data of the plurality of verification images, the target number is the sum of the number of the similarity distances in a plurality of fifth data pairs corresponding to each seventh feature data in the feature data of the plurality of verification images being smaller than the similarity distance threshold and the number of the similarity distances in a plurality of sixth data pairs corresponding to each seventh feature data in the feature data of the plurality of verification images being larger than the similarity distance threshold.

In the embodiment of the application, an optimal critical value can be found through a cross validation mode, the optimal critical value is used as a similarity distance threshold value, so that the accuracy rate that the similarity distances of a plurality of fifth data pairs are smaller than the similarity distance threshold value and the similarity distances of a plurality of sixth data pairs are larger than the similarity distance threshold value is higher, and then the model training effect can be better verified according to the accuracy rate.

With reference to the first aspect, in one possible implementation, the standard image is a front face image, and the non-standard image includes at least one of a left face image, a right face image, a head-up image, a head-down image, a mouth-opening image, a partially-blocked image, an eye-closing image, a frown image, a glasses-wearing image, and a wig-wearing image.

With reference to the first aspect, in a possible implementation manner, after the iteratively performing the training procedure of the first face recognition model, the method further includes: extracting feature data from an image to be recognized through the first face recognition model; if the similarity distance between the feature data of the image to be recognized and the feature data of the preset standard image is smaller than the similarity distance threshold, marking the image to be recognized as the image of the person represented by the preset standard image.

In the embodiment of the application, feature data are extracted from an image to be recognized by using a trained first face recognition model, and if the similarity distance between the feature data of the image to be recognized and the feature data of a preset standard image is smaller than a similarity distance threshold value, the image to be recognized and the preset standard image represent the same person; and if the similarity distance between the feature data of the image to be recognized and the feature data of the preset standard image is greater than the similarity distance threshold, the image to be recognized and the preset standard image represent different people.

In a second aspect, an embodiment of the present application provides an image processing apparatus, which includes at least one processor and a memory, where the memory and the at least one processor are interconnected by a line, and a computer program is stored in the at least one memory; the computer program, when executed by the processor, implements the method described in the first aspect, or any possible implementation manner of the first aspect.

In a third aspect, an embodiment of the present application provides an image processing apparatus, which includes all or part of the functional modules in the method described in implementing the first aspect, or any possible implementation manner of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a processor, the method described in the first aspect or any possible implementation manner of the first aspect is implemented.

By implementing the embodiment of the application, the similarity distance between the standard image and the non-standard image of the same person is reduced and the similarity distance between the standard image of the person and the images of other persons is increased through model training, so that the same person and different persons can be distinguished through the similarity distance. Because the model training of the application makes the similarity distance between one standard image and one non-standard image of the same person close, the common characteristic between two images of the same person can be effectively found out, the inter-class distance is effectively increased, the inter-class distance is reduced, whether the images are the same person can be more easily distinguished, the accuracy rate of the identification of the non-standard images can be further improved, the problems that the common characteristic of all images of the same person is less, the similarity distance between the images of the same person is larger, and the similarity distance between the images of different persons is smaller are also avoided.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments of the present application or the background art will be briefly described below.

Fig. 1 is a schematic architecture diagram of an image processing system according to an embodiment of the present application;

fig. 2A is a schematic flowchart of an image processing method according to an embodiment of the present application;

fig. 2B is a schematic flowchart of another image processing method provided in the embodiment of the present application;

fig. 3 is a schematic view of an application scenario of an image processing apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of another image processing apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a schematic diagram of an architecture of an image processing system provided in an embodiment of the present application, where the system includes a device 101 and a terminal 102, where:

the device 101 is an entity with computing power, for example, the device 101 may be a single server or a server cluster composed of a plurality of servers, and as shown in fig. 1, the device 101 is a server cluster. In this embodiment of the present application, the device 101 needs to execute the image processing method described in this application, which includes training to obtain a first face recognition model, after the first face recognition model is trained, the device 101 may perform face recognition through the first face recognition model, or may send the first face recognition model to the terminal 102, and perform face recognition through the terminal 102.

The terminal 102 may include a handheld device (e.g., a mobile phone, a tablet computer, a palmtop computer, etc.) with a wireless communication function, a vehicle-mounted device (e.g., an automobile, a bicycle, an electric vehicle, an airplane, a ship, etc.), a wearable device (e.g., a smart watch (such as iWatch), a smart bracelet, a pedometer, etc.), a smart home device (e.g., a refrigerator, a television, an air conditioner, an electric meter, etc.), a smart robot, and so on.

Referring to fig. 2A, fig. 2A is an image processing method provided by an embodiment of the present application, which may be implemented based on the system architecture diagram shown in fig. 1, and the method includes, but is not limited to, the following steps:

step S201: the device iteratively performs a training procedure for the first face recognition model.

Specifically, face recognition is usually implemented by extracting face features from an image through a face recognition model, which is obtained through model training. Therefore, before performing face recognition, the device needs to iteratively execute a training process of the first face recognition model to obtain the first face recognition model for performing face recognition. In the embodiment of the present application, the face recognition model is obtained by training a convolutional neural network model, which includes but is not limited to: resnet, vgnet, facenet, etc. Optionally, referring to fig. 2B, the process of the device executing the first facial recognition model may be implemented by steps 2011-2015.

2011: and the equipment adjusts the first face recognition model according to the loss corresponding to the first face recognition model.

Specifically, when the device starts training the first face recognition model, the device is usually provided with an initial face recognition model, for example, the device may use a practical classification model such as vgg, resnet50, resnet101, etc. as the initial face recognition model. The model parameters of the initial first face recognition model are usually randomly generated by the device, but may be artificially pre-configured. In the training process of the first face recognition model, the equipment can adjust the first face recognition model for multiple times, and in the embodiment of the application, the equipment adjusting the first face recognition model is the model parameter for adjusting the first face recognition model. In the current training period, the device adjusts the model parameters of the first face recognition model according to the loss corresponding to the first face recognition model in the previous training period. The loss corresponding to the first face recognition model can be measured by a similarity distance such as an euclidean distance and an absolute distance used for determining the similarity between two images.

2012: the device extracts feature data corresponding to the multiple first sample images from the multiple first sample images obtained through pre-selection through the adjusted first face recognition model so as to obtain feature data of the multiple first sample images.

Specifically, the device usually performs model training based on a sample image set, where the sample image set is composed of a large number of sample images, and the device extracts feature data from the sample images through a face recognition model to perform model training. Optionally, all sample images used for extracting feature data are face images with the same size. The device can intercept the face from the original image by a face detection method, and then scale the face to a certain size (such as 256 × 256 size) to obtain a sample image, so that the feature difference caused by the difference of the size can be reduced, and the comparability of a plurality of sample images is strong, so that the model training effect is better.

In the model training process, the equipment can pre-select a plurality of first sample images from the sample image set, then the first face recognition model extracts feature data from the pre-selected first sample images, each first sample image can extract one feature data, and then the equipment can extract the feature data corresponding to the first sample images respectively to obtain a plurality of feature data. In the embodiment of the present application, the sample image set includes a preset standard image and a preset non-standard image, and then the plurality of first sample images selected from the sample image set also include the standard image and the non-standard image. The standard image may be a face image at a certain preset angle and scene, and the non-standard image may be any one or more face images that are not at the preset angle and scene. For example, when the standard image is a front face image, the non-standard image may include at least one of a left face image, a right face image, a head-up image, a head-down image, a mouth-open image, a partial occlusion image, an eye-closing image, a frown image, a glasses-worn image, and a wig-worn image.

In each training process, the device extracts feature data from a plurality of first sample images selected from the sample image set to serve as an adjustment basis of the first face recognition model. According to the different selection of the first sample images, the equipment can perform model training by adopting a batch gradient descending mode, a random gradient descending mode or a small batch gradient descending mode.

2013: the device combines the first characteristic data and the second characteristic data to obtain a plurality of first data pairs corresponding to the first characteristic data.

2014: the device combines the first characteristic data and the third characteristic data to obtain a plurality of second data pairs corresponding to the first characteristic data.

Specifically, the first feature data is feature data corresponding to a first target standard image in the multiple first sample images, the second feature data is feature data corresponding to a first target non-standard image in the multiple first sample images, the first target standard image and the first target non-standard image are sample images of the same person, the first target standard image is any one standard image in the multiple first sample images, the third feature data is feature data corresponding to a first target sample image in the multiple first sample images, and the first target sample image and the first target standard image are sample images of different persons.

Optionally, in each training process, the device extracts feature data in the standard image and the non-standard image of the same person through the first face recognition model to obtain a plurality of first data pairs, and extracts the standard image of the person and a sample image of another person to obtain a plurality of second data pairs.

2015: and the equipment determines the loss corresponding to the first face recognition model according to the similarity distance of a plurality of first data pairs corresponding to each first feature data in the feature data of the plurality of first sample images and the similarity distance of a plurality of second data pairs.

Specifically, after obtaining the plurality of first data pairs and the plurality of second data pairs, the device determines a loss corresponding to the first face recognition model according to the similarity distances of the plurality of first data pairs and the similarity distances of the plurality of second data pairs, so as to determine a training effect. As introduced above, the corresponding loss of the first face recognition model can be measured by euclidean distance, absolute distance, and the like, which are used to determine the similarity between two images. When the loss corresponding to the first face recognition model is measured through the euclidean distance, the device needs to determine the loss corresponding to the first face recognition model according to the euclidean distances corresponding to the first data pairs and the euclidean distances corresponding to the second data pairs; when the loss corresponding to the first face recognition model is measured by the absolute distance, the device needs to determine the loss corresponding to the first face recognition model according to the absolute distances corresponding to the plurality of first data pairs and the absolute distances corresponding to the plurality of second data pairs. Optionally, in this embodiment of the application, the loss L corresponding to the first face recognition model is: L-L1 + (T-L2), where L1 is an average of square losses of a plurality of first data pairs corresponding to each first feature data in the feature data of the plurality of first sample images, L2 is an average of square losses of a plurality of second data pairs corresponding to each first feature data in the feature data of the plurality of first sample images, and T is a preset distance value. Therefore, the loss corresponding to the first face recognition model is calculated according to the average value of the square losses of the first data pair and the second data pair, the calculation method of the loss corresponding to the first face recognition model is relatively simple, and the calculation cost of model training is reduced. In addition, the similarity distance of the second data pair can be quantitatively controlled by setting the preset distance value to be large so as to obviously distinguish the similarity distance of the first data pair, and further, the effect of easily distinguishing the first data pair from the second data pair is achieved.

The device may determine a training effect based on the similarity distances of the plurality of first data pairs and the similarity distances of the plurality of second data pairs. If the expected training effect is not achieved, continuing the model training; and if the expected training effect is achieved, ending the model training. The expected training effect means that the first data pair and the second data pair can be distinguished through the trained first face recognition model. The device is to distinguish the first data pairs from the second data pairs by the similarity distances of the plurality of first data pairs and the similarity distances of the plurality of second data pairs, and then the distribution of the similarity distances of the plurality of first data pairs and the similarity distances of the plurality of second data pairs is bound to satisfy a certain condition, that is, the loss corresponding to the first face recognition model determined by the similarity distances of the plurality of first data pairs and the similarity distances of the plurality of second data pairs is bound to satisfy a certain condition. Therefore, the device can determine whether the expected training effect is achieved according to whether the loss corresponding to the first face recognition model meets the first preset condition or not so as to end the training process of the first face recognition model. The similarity distance of the plurality of first data pairs and the similarity distance of the plurality of second data pairs can be obtained by calculation of an external device, and then the calculation result is sent to the equipment; the similarity distances of the plurality of first data pairs and the similarity distances of the plurality of second data pairs may also be calculated by the device.

In the embodiment of the application, the device determines whether an expected training effect is achieved according to whether the loss corresponding to the first face recognition model meets a first preset condition, so as to end the training process of the first face recognition model. The first preset condition may be that the target ratio is greater than a first preset ratio, the target ratio is a ratio of the target number to a total number of the plurality of first data pairs and the plurality of second data pairs, and the target number is a sum of a number of the plurality of first data pairs whose similarity distance is less than the similarity distance threshold and a number of the plurality of second data pairs whose similarity distance is greater than the similarity distance threshold. And the similarity distance threshold value can be determined by cross-validation according to the similarity distances of the plurality of first data pairs and the similarity distances of the plurality of second data pairs. The first preset condition may be that the loss corresponding to the first face recognition model is smaller than a first preset loss under the condition that the target proportion obtained in the continuous preset times of iterative training is smaller than the first preset proportion but larger than a second preset proportion. The first preset condition may be that the iterative training times of the first face recognition model are greater than preset times. The first preset condition may be that a loss corresponding to the first face recognition model is smaller than a first preset loss. Optionally, the loss L corresponding to the first face recognition model is shown in formula 1-1.

In equation 1-1, m is the total number of the plurality of first data pairs and the plurality of second data pairs, l₂ ⁱEuclidean distance for the ith data pair; when the ith data pair is the first data pair,

when the ith data pair is the second data pair,

a is a first preset value, and b is a second preset value. If the loss corresponding to the first face recognition model is smaller than the first preset loss, at this time, the model training is ended, so that when any two images are subsequently processed by the first face recognition model, whether the features of the first face recognition model accord with the features of the first data pair or accord with the features of the second data pair can be determined, if the features accord with the features of the first data pair, the persons in the two images are the same person, and if the features accord with the features of the second data pair, the persons in the two images are different persons. In addition, the similarity distance of the second data pair can be quantitatively controlled by setting the second preset value to be large so as to obviously distinguish the similarity distance of the first data pair, and further, the effect of easily distinguishing the first data pair from the second data pair is achieved.

In the process of iteratively executing the training process of the first face recognition model, the device may perform step 2011-2015 in a loop manner, where the device may obtain, through the adjusted first face recognition model, a plurality of first data pairs and a plurality of second data pairs each time the device is cycled, and then determine whether to continue model training according to the similarity distances of the plurality of first data pairs and the similarity distances of the newly obtained plurality of second data pairs, until a loss corresponding to the first face recognition model meets a first preset condition, the device may end the model training to obtain the first face recognition model for face recognition.

The trained first face recognition model can improve the accuracy of recognition of the non-standard images, and the types of the non-standard images are multiple, so that the trained first face recognition model can improve the accuracy of recognition of more types of non-standard images, and certain requirements are required on the number of sample images in the sample image set and the number of the types of the non-standard images in the sample image set. The device usually performs model training based on an open sample image set, and although the open sample image set can meet the requirement of the model training on the number of sample images, it cannot be guaranteed that the types of non-standard images in the open sample image set are complete enough, but it is relatively difficult to prepare the open sample image set with the types of non-standard images being complete enough. Therefore, in the embodiment of the present application, the plurality of first sample images are selected from a first sample image set, the number of types of the non-standard images in the first sample image set is greater than a preset number, and before the device iteratively executes the training process of the first face recognition model, the device iteratively executes the training process of the second face recognition model based on the second sample image set until the loss corresponding to the second face recognition model meets a second preset condition, so as to obtain the first face recognition model.

Specifically, the number of the second sample images in the second sample image set is greater than a preset number, wherein the preset number is preset according to the number requirement of model training on the sample images. The second sample image set may be a public sample image set, for example, a public large-scale face data set VGG2, MS1M, or the like. The device performs model training on the second face recognition model based on the second sample image set to obtain the first face recognition model. The training mode of the device for the second face recognition model may be the same as the training mode of the device for the first face recognition model. Namely, the training process of the second face recognition model is as follows: the equipment adjusts the second face recognition model according to the loss corresponding to the second face recognition model; the device extracts feature data corresponding to a plurality of second sample images from a plurality of second sample images obtained through pre-selection through the adjusted second face recognition model to obtain feature data of the plurality of second sample images, wherein the plurality of second sample images comprise standard images and non-standard images, the plurality of second sample images are selected from a second sample image set, and the number of the second sample images in the second sample image set is greater than the preset number; the device combines the fourth feature data and the fifth feature data to obtain a plurality of third data pairs corresponding to the fourth feature data, wherein the fourth feature data is feature data corresponding to a second target standard image in a plurality of second sample images, the fifth feature data is feature data corresponding to a second target non-standard image in the plurality of second sample images, the second target standard image and the second target non-standard image are sample images of the same person, and the second target standard image is any standard image in the plurality of second sample images; the device combines the fourth feature data with sixth feature data to obtain a plurality of fourth data pairs corresponding to the fourth feature data, wherein the sixth feature data is feature data corresponding to a second target sample image in a plurality of second sample images, and the second target sample image and the second target standard image are sample images of different persons; and determining the loss corresponding to the second face recognition model according to the similarity distance of a plurality of third data pairs corresponding to each fourth feature data in the feature data of the plurality of second sample images and the similarity distance of the plurality of fourth data pairs by the equipment.

Optionally, in this embodiment of the application, when the standard image is a front face image, before the device combines the fourth feature data with the fifth feature data to obtain a plurality of third data pairs corresponding to the fourth feature data, the device needs to determine the front face image from the sample image set, and the device may determine the front face image from the sample image set by the following steps.

First, the device calculates a face angle of a second sample image corresponding to the target person.

Then, the device selects the front face image corresponding to the target person with the smallest face angle from the second sample image corresponding to the target person.

Specifically, the target person is any one of all persons represented by the plurality of second sample images. Optionally, the device may use dlib face recognition algorithm to perform face key point detection on the second sample image, and then calculate the face angle by using a face pose estimation model (there are many mature face pose estimation algorithms at present, that is, calculating the face three-dimensional angle according to the key point). And detecting a sample image corresponding to each person in the second sample image set by using the method, finding out the sample image with the smallest face angle corresponding to each person as the front face image of the person, and recording the name and the position of the sample image. Therefore, the face image of each person can be easily found from the second sample image set by calculating the face angle.

And when the loss corresponding to the second face recognition model meets a second preset condition, the equipment ends the model training of the second face recognition model to obtain the first face recognition model. The second preset condition may be set in the same manner as the first preset condition described above. For example, the second preset condition may be that the target ratio is greater than the first preset ratio, the target ratio is a ratio of the target number to a total number of the plurality of third data pairs and the plurality of fourth data pairs, and the target number is a sum of a number of the plurality of third data pairs whose similarity distance is less than the similarity distance threshold and a number of the plurality of fourth data pairs whose similarity distance is greater than the similarity distance threshold; for another example, the first preset condition is that the loss corresponding to the second face recognition model is smaller than the first preset loss under the condition that the target proportion obtained in the continuous preset times of iterative training is smaller than the first preset proportion but larger than the second preset proportion. The second preset condition can also be that the iterative training times of the second face recognition model are more than the preset times; for another example, the first preset condition is that the loss corresponding to the second face recognition model is smaller than the first preset loss.

After obtaining the first face recognition model, the device trains the first face recognition model based on the first sample image set. The number of the types of the non-standard images in the first sample image set is greater than the preset number, and the number of the sample image sets in the first sample image set is relatively small, so that the first sample image set is easy to collect. For example, a plurality of first sample images are prepared in advance, the standard image is a front face image, and the front face image is a face image with no occlusion, eyes open, mouth closed, and no glasses. One per face image, one per non-standard image (at least 20), 9200 sample images are required to obtain the first set of sample images. The nonstandard images can comprise a left face 15-degree image, a left face 30-degree image, a left face 45-degree image, a right face 15-degree image, a right face 30-degree image, a right face 45-degree image, a head-raising face 15-degree image, a head-raising face 30-degree image, a head-lowering face 15-degree image, a head-lowering face 30-degree image, a mouth opening image, 5 partial shielding images (the shielding part does not exceed 20 percent of the face), an eye closing image, a frown image, a glasses wearing image and a wig wearing image (different shapes can be set for the wig).

Optionally, in this embodiment of the application, after the apparatus iteratively performs the training procedure of the first face recognition model, the apparatus may further verify the effect of the verification model training by the following steps.

Firstly, extracting feature data for verification, which correspond to a plurality of verification images respectively, from the plurality of verification images selected in advance through the first face recognition model by the equipment so as to obtain the feature data of the plurality of verification images.

Then, the device combines the seventh feature data with the eighth feature data to obtain a plurality of fifth data pairs corresponding to the seventh feature data, and combines the seventh feature data with the ninth feature data to obtain a plurality of sixth data pairs corresponding to the seventh feature data.

And finally, the equipment determines a similarity distance threshold value according to the similarity distances of a plurality of fifth data pairs and the similarity distances of a plurality of sixth data pairs corresponding to each seventh feature data in the feature data of the plurality of verification images in a cross verification mode.

Specifically, the plurality of verification images include a standard image and a non-standard image, and the verification image set is selected by the device from the verification image set, which may be collected in the same manner as the first sample image set, for example, in the sample image collection manner described above, 10000 persons of sample images are collected, wherein 9200 persons of sample images are used as the first sample image set, and 800 persons of sample images are used as the verification image set. The seventh feature data is feature data corresponding to a third target standard image in the multiple verification images, the eighth feature data is feature data corresponding to a third target non-standard image in the multiple verification images, the third target standard image and the third target non-standard image are sample images of the same person, and the third target standard image is any standard image in the multiple verification images. The ninth feature data is feature data corresponding to a third target sample image in the multiple verification images, and the third target sample image and the third target standard image are sample images of different persons.

The similarity distance threshold is used for calculating a target proportion, the target proportion is a ratio of the target number to the total number of the plurality of fifth data pairs and the plurality of sixth data pairs corresponding to each seventh feature data in the feature data of the plurality of verification images, and the target number is the sum of the number, smaller than the similarity distance threshold, of the similarity distances in the plurality of fifth data pairs corresponding to each seventh feature data in the feature data of the plurality of verification images and the number, larger than the similarity distance threshold, of the similarity distances in the plurality of sixth data pairs corresponding to each seventh feature data in the feature data of the plurality of verification images. The device can be used for calculating a target proportion according to the similarity distance threshold value, if the target proportion is larger than a first preset proportion, the device represents the first face recognition model, so that the distances between the standard image and the non-standard image of the same person are close, different persons can be distinguished, and the device determines that the training effect on the first face recognition model is good. Therefore, an optimal critical value can be found through a cross validation mode, the optimal critical value is used as a similarity distance threshold value, the accuracy rate that the similarity distance of the fifth data pairs is smaller than the similarity distance threshold value and the similarity distance of the sixth data pairs is larger than the similarity distance threshold value is higher, and then the model training effect can be better verified according to the accuracy rate.

Step S202: the equipment extracts feature data from the image to be recognized through the first face recognition model.

Step S203: if the similarity distance between the feature data of the image to be recognized and the feature data of the preset standard image is smaller than a similarity distance threshold value, the device marks the image to be recognized as the image of the person represented by the preset standard image.

Specifically, the image to be recognized may be acquired by an external device and then transmitted to the apparatus; the image to be recognized may also be acquired by the device. The device can be a device which is pre-stored with a preset standard image, and when the device uses the preset standard image, the device extracts feature data from the preset standard image; the device can also extract and store the characteristic data from the preset standard image when the preset standard image is acquired.

If the similarity distance between the feature data of the image to be recognized and the feature data of the preset standard image is smaller than the similarity distance threshold, the device marks the image to be recognized as the image of the person represented by the preset standard image, namely the image to be recognized and the preset standard image represent the same person. If the similarity distance between the feature data of the image to be recognized and the feature data of the preset standard image is greater than the similarity distance threshold, the image to be recognized and the preset standard image represent different people, and the device can mark the image to be recognized as an image of a person not represented by the preset standard image; the device may also not perform any action; if there are a plurality of preset standard images, the device may further perform a step of calculating a similarity distance between the feature data of the image to be recognized and the feature data of the next preset standard image. If the similarity distance between the feature data of the image to be recognized and the feature data of the preset standard image is equal to the similarity distance threshold, the recognition result of the equipment can be set according to actual requirements, and in one case, the equipment determines that the image to be recognized and the preset standard image represent the same person when the similarity distance between the feature data of the image to be recognized and the feature data of the preset standard image is equal to the similarity distance threshold; another case may be that when the similarity distance between the feature data of the image to be recognized and the feature data of the preset standard image is equal to the similarity distance threshold, the apparatus determines that the image to be recognized and the preset standard image represent different persons.

Alternatively, the step S202 and the step S203 may be performed by other devices, for example, the device is a server, the execution main body for actually performing face recognition is a recognition terminal having a face recognition function, after the first face recognition model is trained, the server sends the trained first face recognition model to the recognition terminal, and the recognition terminal performs face recognition by using the trained first face recognition model.

The device can be used in access authorization scenarios, video surveillance scenarios, face-brushing payment scenarios, and the like. For example, referring to fig. 3, when the device is used in a door access authorization scenario of a building, the device may include a gate 301, a camera 302 and a console 303, and the console 303 is electrically connected to the gate 301 and the camera 302. The gate 301 is arranged at an entrance passage of the building, and when the gate 301 is opened, a person to be identified can enter the building; when the gate 301 is closed, the person to be identified cannot enter the building. The camera 302 is also arranged at an entrance passage of the building, and the camera 302 is used for capturing a face image of the person to be recognized and sending the face image of the person to be recognized to the console 303 so as to generate the image to be recognized. The control console 303 is arranged in a building or is integrally arranged with the gate 301, the control console 303 is provided with a first face recognition model, and feature data in a face image can be extracted through the first face recognition model. The console 303 has stored therein in advance a frontal image of a person authorized to enter the building through the gate 301 as a preset standard image. The gate 301 is closed in a normal state, and when a person to be recognized wants to enter the building through the gate 301, the camera 302 can capture a face image of the person to be recognized at any angle and send the face image of the person to be recognized to the console 303 to generate an image to be recognized. The control console 303 extracts feature data from the image to be recognized through the first face recognition model, calculates a similarity distance between the feature data of the image to be recognized and feature data of a preset standard image, and if the similarity distance between the feature data of the image to be recognized and the feature data of the preset standard image is smaller than a similarity distance threshold value, the control console 303 determines that the person to be recognized is a person authorized to enter the building, and then the control console 303 controls the gate 301 to be opened; if the similarity distance between the feature data of the image to be recognized and the feature data of the preset standard image is larger than the similarity distance threshold value, the control console 303 determines that the person to be recognized is not a person authorized to enter the building, and then the control console 303 does not control the gate 301 to be opened.

In the method described in fig. 2A and 2B, the similarity distance between the standard image and the non-standard image of the same person is decreased and the similarity distance between the standard image of the person and the image of another person is increased through model training, so that the same person and the different person can be distinguished through the similarity distance. Because the model training of the application makes the similarity distance between one standard image and one non-standard image of the same person close, the common characteristic between two images of the same person can be effectively found out, the inter-class distance is effectively increased, the inter-class distance is reduced, whether the images are the same person can be more easily distinguished, the accuracy rate of the identification of the non-standard images can be further improved, the problems that the common characteristic of all images of the same person is less, the similarity distance between the images of the same person is larger, and the similarity distance between the images of different persons is smaller are also avoided.

Referring to fig. 4, fig. 4 is an image processing apparatus 40 according to an embodiment of the present disclosure, where the image processing apparatus 40 may be the above-mentioned device or a part of the above-mentioned device. The image processing apparatus 40 includes a first training module 401, the first training module 401 includes a first adjusting unit 4011, a first extracting unit 4012, a first combining unit 4013, a second combining unit 4014, and a first determining unit 4015, and detailed descriptions of the respective modules and units of the image processing apparatus 40 are as follows.

A first training module 401, configured to iteratively execute a training process of a first face recognition model until a loss corresponding to the first face recognition model meets a first preset condition, where the first face recognition model when the loss meets the first preset condition is used to perform face recognition;

wherein the first training module 401 comprises:

a first adjusting unit 4011, configured to adjust the first face recognition model according to a loss corresponding to the first face recognition model;

the first extraction unit 4012 is configured to extract, through the adjusted first face recognition model, feature data corresponding to each of a plurality of first sample images from the plurality of first sample images selected in advance to obtain feature data of the plurality of first sample images, where the plurality of first sample images include a preset standard image and a preset non-standard image;

a first combining unit 4013, configured to combine first feature data and second feature data to obtain a plurality of first data pairs corresponding to the first feature data, where the first feature data is feature data corresponding to a first target standard image in the plurality of first sample images, the second feature data is feature data corresponding to a first target non-standard image in the plurality of first sample images, the first target standard image and the first target non-standard image are sample images of the same person, and the first target standard image is any standard image in the plurality of first sample images;

a second combining unit 4014, configured to combine the first feature data with third feature data to obtain a plurality of second data pairs corresponding to the first feature data, where the third feature data is feature data corresponding to a first target sample image in the plurality of first sample images, and the first target sample image and the first target standard image are sample images of different people;

a first determining unit 4015, configured to determine a loss corresponding to the first face recognition model according to similarity distances of a plurality of first data pairs corresponding to each first feature data in the feature data of the plurality of first sample images and similarity distances of a plurality of second data pairs.

In an alternative, the preset condition is: the first preset condition is as follows: and the loss corresponding to the first face recognition model is less than the preset loss.

In an alternative, the loss L corresponding to the first face recognition model is: L-L1 + (T-L2), L1 is an average of square losses of a plurality of first data pairs corresponding to each first feature data in the feature data of the plurality of first sample images, L2 is an average of square losses of a plurality of second data pairs corresponding to each first feature data in the feature data of the plurality of first sample images, and T is a preset distance value.

In an optional scheme, the system further comprises a second training module, wherein the plurality of first sample images are selected from a first sample image set, and the number of the types of the non-standard images in the first sample image set is greater than a preset number; the second training module is configured to iteratively execute a training process of a second face recognition model before the iterative execution of the training process of the first face recognition model until a loss corresponding to the second face recognition model meets a second preset condition, so as to obtain the first face recognition model;

wherein the second training module comprises:

the second adjusting unit is used for adjusting the second face recognition model according to the loss corresponding to the second face recognition model;

a second extracting unit, configured to extract, through the adjusted second face recognition model, feature data corresponding to each of a plurality of second sample images from a plurality of second sample images obtained by pre-selection, so as to obtain feature data of the plurality of second sample images, where the plurality of second sample images include the standard image and the non-standard image, the plurality of second sample images are selected from a second sample image set, and the number of second sample images in the second sample image set is greater than a preset number;

a third combining unit, configured to combine fourth feature data and fifth feature data to obtain multiple third data pairs corresponding to the fourth feature data, where the fourth feature data is feature data corresponding to a second target standard image in the multiple second sample images, the fifth feature data is feature data corresponding to a second target non-standard image in the multiple second sample images, the second target standard image and the second target non-standard image are sample images of the same person, and the second target standard image is any standard image in the multiple second sample images;

a fourth combining unit, configured to combine the fourth feature data with sixth feature data to obtain a plurality of fourth data pairs corresponding to the fourth feature data, where the sixth feature data is feature data corresponding to a second target sample image in the plurality of second sample images, and the second target sample image and the second target standard image are sample images of different persons;

and a second determining unit, configured to determine a loss corresponding to the second face recognition model according to similarity distances of a plurality of third data pairs corresponding to fourth feature data in the feature data of the plurality of second sample images and similarity distances of the plurality of fourth data pairs.

In an optional scheme, the second training module further includes a selecting unit, where the selecting unit is configured to calculate a face angle of a second sample image corresponding to a target person before the fourth feature data and the fifth feature data are combined to obtain a plurality of third data pairs corresponding to the fourth feature data, where the target person is any one of all persons represented by the plurality of second sample images; and selecting the second sample image corresponding to the target person as the front face image corresponding to the target person with the smallest face angle.

In an optional scheme, the system further includes a verification module, where the verification module is configured to, after the iterative execution of a training process of a first face recognition model, extract, by using the first face recognition model, feature data for verification, which correspond to each of a plurality of verification images, from the plurality of verification images obtained in advance, so as to obtain feature data of the plurality of verification images, where the plurality of verification images include the standard image and the non-standard image; combining seventh feature data and eighth feature data to obtain a plurality of fifth data pairs corresponding to the seventh feature data, where the seventh feature data is feature data corresponding to a third target standard image in the plurality of verification images, the eighth feature data is feature data corresponding to a third target non-standard image in the plurality of verification images, the third target standard image and the third target non-standard image are sample images of the same person, and the third target standard image is any standard image in the plurality of verification images; combining the seventh feature data with ninth feature data to obtain a plurality of sixth data pairs corresponding to the seventh feature data, where the ninth feature data is feature data corresponding to a third target sample image in the plurality of verification images, and the third target sample image and the third target standard image are sample images of different persons; determining similarity distance threshold values according to similarity distances of a plurality of fifth data pairs and similarity distances of a plurality of sixth data pairs corresponding to each seventh feature data in feature data of the plurality of verification images in a cross-validation mode, wherein the similarity distance threshold is used for calculating a target proportion, the target proportion is a ratio of the number of targets to the total number of a plurality of fifth data pairs and a plurality of sixth data pairs corresponding to each seventh feature data in the feature data of the plurality of verification images, the target number is the sum of the number of the similarity distances in a plurality of fifth data pairs corresponding to each seventh feature data in the feature data of the plurality of verification images being smaller than the similarity distance threshold and the number of the similarity distances in a plurality of sixth data pairs corresponding to each seventh feature data in the feature data of the plurality of verification images being larger than the similarity distance threshold.

In an alternative, the standard image is a front face image, and the non-standard image includes at least one of a left side face image, a right side face image, a head-up image, a head-down image, a mouth-opening image, a partial occlusion image, an eye-closing image, a eyebrow-curling image, a glasses-wearing image, and a wig-wearing image.

In an alternative, the loss corresponding to the first face recognition model is the euclidean distance or the absolute distance.

In an optional scheme, the method further includes a recognition module, where the recognition module is configured to, after the iterative execution of a training procedure of a first face recognition model, extract feature data from an image to be recognized through the first face recognition model; if the similarity distance between the feature data of the image to be recognized and the feature data of the preset standard image is smaller than the similarity distance threshold, marking the image to be recognized as the image of the person represented by the preset standard image.

The specific implementation and beneficial effects of each module and unit in the image processing apparatus shown in fig. 4 may also correspond to the corresponding descriptions of the method embodiments shown in fig. 2A and fig. 2B, and are not described again here.

Referring to fig. 5, fig. 5 is an image processing apparatus 50 according to an embodiment of the present disclosure, where the image processing apparatus 50 may be the above-mentioned device or a part of the above-mentioned device. The image processing apparatus 50 comprises a processor 501 and a memory 502, said processor 501 and memory 502 being interconnected by a bus 503.

The memory 502 includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a portable read-only memory (CD-ROM), and the memory 502 is used for related computer programs and data.

The processor 501 may be one or more Central Processing Units (CPUs), and in the case that the processor 501 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.

The processor 501 in the image processing apparatus 50 is configured to read the computer program code stored in the memory 502, and perform the following operations:

the training process of the first face recognition model specifically comprises the following steps:

In a possible embodiment, the first preset condition is: and the loss corresponding to the first face recognition model is less than the preset loss.

In a possible implementation, the first face recognition model corresponds to a loss L of: L-L1 + (T-L2), L1 is an average of square losses of a plurality of first data pairs corresponding to each first feature data in the feature data of the plurality of first sample images, L2 is an average of square losses of a plurality of second data pairs corresponding to each first feature data in the feature data of the plurality of first sample images, and T is a preset distance value.

In a possible implementation manner, the plurality of first sample images are selected from a first sample image set, and the number of the types of the non-standard images in the first sample image set is greater than a preset number; before the iterative execution of the training process of the first face recognition model, the method further includes: iteratively executing a training process of a second face recognition model until the loss corresponding to the second face recognition model meets a second preset condition to obtain the first face recognition model;

the training process of the second face recognition model specifically comprises the following steps: adjusting the second face recognition model according to the loss corresponding to the second face recognition model; extracting feature data corresponding to a plurality of second sample images from a plurality of second sample images obtained in advance through the adjusted second face recognition model to obtain feature data of the plurality of second sample images, wherein the plurality of second sample images comprise the standard images and the non-standard images, the plurality of second sample images are selected from a second sample image set, and the number of the second sample images in the second sample image set is greater than a preset number; combining fourth feature data and fifth feature data to obtain a plurality of third data pairs corresponding to the fourth feature data, where the fourth feature data is feature data corresponding to a second target standard image in the plurality of second sample images, the fifth feature data is feature data corresponding to a second target non-standard image in the plurality of second sample images, the second target standard image and the second target non-standard image are sample images of the same person, and the second target standard image is any standard image in the plurality of second sample images; combining the fourth feature data with sixth feature data to obtain a plurality of fourth data pairs corresponding to the fourth feature data, where the sixth feature data is feature data corresponding to a second target sample image in the plurality of second sample images, and the second target sample image and the second target standard image are sample images of different persons; and determining the loss corresponding to the second face recognition model according to the similarity distance of a plurality of third data pairs corresponding to each fourth feature data in the feature data of the plurality of second sample images and the similarity distance of a plurality of fourth data pairs.

In one possible embodiment, the standard image is a front face image; before the combining the fourth feature data and the fifth feature data to obtain a plurality of third data pairs corresponding to the fourth feature data, further performing: calculating the face angle of a second sample image corresponding to a target person, wherein the target person is any one of all persons represented by the second sample images; and selecting the second sample image corresponding to the target person as the front face image corresponding to the target person with the smallest face angle.

In a possible implementation manner, after the iteratively executing the training procedure of the first face recognition model, further executing: extracting feature data for verification, which correspond to a plurality of verification images respectively, from a plurality of verification images obtained through pre-selection through the first face recognition model to obtain feature data of the plurality of verification images, wherein the plurality of verification images comprise the standard images and the non-standard images; combining seventh feature data and eighth feature data to obtain a plurality of fifth data pairs corresponding to the seventh feature data, where the seventh feature data is feature data corresponding to a third target standard image in the plurality of verification images, the eighth feature data is feature data corresponding to a third target non-standard image in the plurality of verification images, the third target standard image and the third target non-standard image are sample images of the same person, and the third target standard image is any standard image in the plurality of verification images; combining the seventh feature data with ninth feature data to obtain a plurality of sixth data pairs corresponding to the seventh feature data, where the ninth feature data is feature data corresponding to a third target sample image in the plurality of verification images, and the third target sample image and the third target standard image are sample images of different persons; determining similarity distance threshold values according to similarity distances of a plurality of fifth data pairs and similarity distances of a plurality of sixth data pairs corresponding to each seventh feature data in feature data of the plurality of verification images in a cross-validation mode, the similarity distance threshold is used for calculating a target proportion, the target proportion is the ratio of the number of targets to the total number of a plurality of fifth data pairs and a plurality of sixth data pairs corresponding to each seventh feature data in the feature data of the plurality of verification images, the target number is the sum of the number of the similarity distances in a plurality of fifth data pairs corresponding to each seventh feature data in the feature data of the plurality of verification images being smaller than the similarity distance threshold and the number of the similarity distances in a plurality of sixth data pairs corresponding to each seventh feature data in the feature data of the plurality of verification images being larger than the similarity distance threshold.

In one possible embodiment, the standard image is a front face image, and the non-standard image includes at least one of a left face image, a right face image, a head-up image, a head-down image, a mouth-opening image, a partial occlusion image, an eye-closing image, a eyebrow-curling image, a glasses-wearing image, and a wig-wearing image.

In a possible implementation manner, after the iteratively executing the training procedure of the first face recognition model, further executing: extracting feature data from an image to be recognized through the first face recognition model; if the similarity distance between the feature data of the image to be recognized and the feature data of the preset standard image is smaller than the similarity distance threshold, marking the image to be recognized as the image of the person represented by the preset standard image.

In a possible embodiment, the loss corresponding to the first face recognition model is the euclidean distance or the absolute distance.

The specific implementation and beneficial effects of each module in the image processing apparatus shown in fig. 5 may also correspond to the corresponding descriptions of the method embodiments shown in fig. 2A and fig. 2B, and are not described again here.

An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on an image processing apparatus, the method shown in fig. 2A and 2B is implemented.

In summary, through model training, the similarity distance between the standard image and the non-standard image of the same person is reduced, and the similarity distance between the standard image of the person and the images of other persons is increased, so that the same person and different persons can be distinguished through the similarity distance. Because the model training of the application makes the similarity distance between one standard image and one non-standard image of the same person close, the common characteristic between two images of the same person can be effectively found out, the inter-class distance is effectively increased, the inter-class distance is reduced, whether the images are the same person can be more easily distinguished, the accuracy rate of the identification of the non-standard images can be further improved, the problems that the common characteristic of all images of the same person is less, the similarity distance between the images of the same person is larger, and the similarity distance between the images of different persons is smaller are also avoided.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. And the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Claims

1. An image processing method, comprising:

2. The method according to claim 1, wherein the first preset condition is: and the loss corresponding to the first face recognition model is less than the preset loss.

3. The method according to claim 2, wherein the first face recognition model corresponds to a loss L of:

L＝L1+(T-L2)

wherein L1 is an average of square losses of a plurality of first data pairs corresponding to each first feature data in the feature data of the plurality of first sample images, L2 is an average of square losses of a plurality of second data pairs corresponding to each first feature data in the feature data of the plurality of first sample images, and T is a preset distance value.

4. The method according to claim 1, wherein the plurality of first sample images are selected from a first sample image set, and the number of types of non-standard images in the first sample image set is greater than a preset number; before the iterative execution of the training process of the first face recognition model, the method further includes:

iteratively executing a training process of a second face recognition model until the loss corresponding to the second face recognition model meets a second preset condition to obtain the first face recognition model;

wherein the training process of the second face recognition model comprises:

adjusting the second face recognition model according to the loss corresponding to the second face recognition model;

extracting feature data corresponding to a plurality of second sample images from a plurality of second sample images obtained in advance through the adjusted second face recognition model to obtain feature data of the plurality of second sample images, wherein the plurality of second sample images comprise the standard images and the non-standard images, the plurality of second sample images are selected from a second sample image set, and the number of the second sample images in the second sample image set is greater than a preset number;

combining fourth feature data and fifth feature data to obtain a plurality of third data pairs corresponding to the fourth feature data, where the fourth feature data is feature data corresponding to a second target standard image in the plurality of second sample images, the fifth feature data is feature data corresponding to a second target non-standard image in the plurality of second sample images, the second target standard image and the second target non-standard image are sample images of the same person, and the second target standard image is any standard image in the plurality of second sample images;

combining the fourth feature data with sixth feature data to obtain a plurality of fourth data pairs corresponding to the fourth feature data, where the sixth feature data is feature data corresponding to a second target sample image in the plurality of second sample images, and the second target sample image and the second target standard image are sample images of different persons;

and determining the loss corresponding to the second face recognition model according to the similarity distance of a plurality of third data pairs corresponding to each fourth feature data in the feature data of the plurality of second sample images and the similarity distance of a plurality of fourth data pairs.

5. The method according to claim 4, wherein the standard image is a front face image; before the combining the fourth feature data and the fifth feature data to obtain a plurality of third data pairs corresponding to the fourth feature data, the method further includes:

calculating the face angle of a second sample image corresponding to a target person, wherein the target person is any one of all persons represented by the second sample images;

and selecting the second sample image corresponding to the target person as the front face image corresponding to the target person with the smallest face angle.

6. The method according to claim 1, wherein after the iteratively executing the training procedure of the first face recognition model, further comprising:

extracting feature data for verification, which correspond to a plurality of verification images respectively, from a plurality of verification images obtained through pre-selection through the first face recognition model to obtain feature data of the plurality of verification images, wherein the plurality of verification images comprise the standard images and the non-standard images;

combining seventh feature data and eighth feature data to obtain a plurality of fifth data pairs corresponding to the seventh feature data, where the seventh feature data is feature data corresponding to a third target standard image in the plurality of verification images, the eighth feature data is feature data corresponding to a third target non-standard image in the plurality of verification images, the third target standard image and the third target non-standard image are sample images of the same person, and the third target standard image is any standard image in the plurality of verification images;

combining the seventh feature data with ninth feature data to obtain a plurality of sixth data pairs corresponding to the seventh feature data, where the ninth feature data is feature data corresponding to a third target sample image in the plurality of verification images, and the third target sample image and the third target standard image are sample images of different persons;

determining similarity distance threshold values according to similarity distances of a plurality of fifth data pairs and similarity distances of a plurality of sixth data pairs corresponding to each seventh feature data in feature data of the plurality of verification images in a cross-validation mode, wherein the similarity distance threshold is used for calculating a target proportion, the target proportion is a ratio of the number of targets to the total number of a plurality of fifth data pairs and a plurality of sixth data pairs corresponding to each seventh feature data in the feature data of the plurality of verification images, the target number is the sum of the number of the similarity distances in a plurality of fifth data pairs corresponding to each seventh feature data in the feature data of the plurality of verification images being smaller than the similarity distance threshold and the number of the similarity distances in a plurality of sixth data pairs corresponding to each seventh feature data in the feature data of the plurality of verification images being larger than the similarity distance threshold.

7. The method of claim 1, wherein the standard image is a front face image and the non-standard image comprises at least one of a left face image, a right face image, a head-up image, a head-down image, a mouth-opening image, a partial occlusion image, an eye-closing image, a frown image, a glasses-worn image, and a wig-worn image.

8. The method according to claim 1, wherein after the iteratively executing the training procedure of the first face recognition model, further comprising:

extracting feature data from an image to be recognized through the first face recognition model;

if the similarity distance between the feature data of the image to be recognized and the feature data of the preset standard image is smaller than the similarity distance threshold, marking the image to be recognized as the image of the person represented by the preset standard image.

9. An image processing apparatus comprising at least one processor and a memory, said memory and said at least one processor being interconnected by a line, said at least one memory having a computer program stored therein; the computer program, when executed by the processor, implements the method of any of claims 1-8.

10. A computer-readable storage medium, in which a computer program is stored which, when run on a processor, carries out the method of any one of claims 1 to 8.