CN114708625A - Face recognition method and device - Google Patents

Face recognition method and device Download PDF

Info

Publication number
CN114708625A
CN114708625A CN202111360868.9A CN202111360868A CN114708625A CN 114708625 A CN114708625 A CN 114708625A CN 202111360868 A CN202111360868 A CN 202111360868A CN 114708625 A CN114708625 A CN 114708625A
Authority
CN
China
Prior art keywords
sample
feature
class
neural network
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111360868.9A
Other languages
Chinese (zh)
Inventor
黄泽元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xumi Yuntu Space Technology Co Ltd
Original Assignee
Shenzhen Jizhi Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Jizhi Digital Technology Co Ltd filed Critical Shenzhen Jizhi Digital Technology Co Ltd
Priority to CN202111360868.9A priority Critical patent/CN114708625A/en
Publication of CN114708625A publication Critical patent/CN114708625A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure relates to the technical field of artificial intelligence, and provides a face recognition method and device. The method comprises the following steps: extracting a first feature, a second feature and a third feature of each sample in an input image through a neural network model, performing feature fusion processing on the three features to obtain a fusion feature of each sample, and further calculating to obtain a first mass score of each sample and a second mass score of a class center corresponding to each class; calculating the cosine value of the deviation angle of each sample and the class center corresponding to the class to which each sample belongs through a neural network model; training a neural network model through a loss function according to the first mass score of each sample, the second mass score of the class center corresponding to the class to which each sample belongs and the cosine value of the deviation angle of the class center corresponding to the class to which each sample and each sample belong; and performing face recognition by using the trained neural network model.

Description

Face recognition method and device
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a face recognition method and apparatus.
Background
The general method of the existing face recognition algorithm is to process all training samples in one view, and the training problems caused by different sample qualities and class centers with different qualities are not considered in the training of the face recognition model. Different sample qualities and different quality class centers actually involve image feature extraction and image feature quality estimation. In the existing face calculation recognition technology, a feature extraction task of a face image and a quality estimation task related to features are independent from each other, and the feature extraction task and the quality estimation task are not related or can not be related, so that the problem that image feature extraction and image feature quality estimation can not be mutually promoted in training of a face recognition model is caused.
In the course of implementing the disclosed concept, the inventors found that there are at least the following technical problems in the related art: the problem that image feature extraction and image feature quality estimation cannot mutually promote in training of a face recognition model is solved.
Disclosure of Invention
In view of this, the embodiments of the present disclosure provide a face recognition method and apparatus, so as to solve the problem in the prior art that image feature extraction and image feature quality estimation cannot be mutually facilitated in training a face recognition model.
In a first aspect of the embodiments of the present disclosure, a face recognition method is provided, including: the method comprises the steps of obtaining an input image, extracting a first feature, a second feature and a third feature of each sample in the input image through a neural network model, and calculating a cosine value of a deviation angle between each sample and a class center corresponding to a class to which each sample belongs through the neural network model, wherein the input image has a plurality of classes, each class corresponds to one class center, and each class has a plurality of samples; performing feature fusion processing on the first feature, the second feature and the third feature of each sample to obtain a fusion feature of each sample; interactively computing at least one of the following characteristics of each sample: the first feature, the third feature and the fused feature to obtain a first mass fraction of each sample; calculating a second mass score of a class center corresponding to each class according to the mass scores of a plurality of samples in each class in the input image; training a neural network model through a loss function according to the first mass score of each sample, the second mass score of the class center corresponding to the class to which each sample belongs and the cosine value of the deviation angle of the class center corresponding to the class to which each sample and each sample belong; and performing face recognition by using the trained neural network model.
In a second aspect of the embodiments of the present disclosure, a face recognition apparatus is provided, including: the first calculation module is used for acquiring an input image, extracting a first feature, a second feature and a third feature of each sample in the input image through a neural network model, and calculating a cosine value of a deviation angle of each sample and a class center corresponding to a class to which each sample belongs through the neural network model, wherein the input image has a plurality of classes, each class corresponds to one class center, and each class has a plurality of samples; the characteristic fusion module is used for carrying out characteristic fusion processing on the first characteristic, the second characteristic and the third characteristic of each sample to obtain fusion characteristics of each sample; the second calculation module is used for performing interactive calculation on at least one of the following characteristics of each sample: the first feature, the third feature and the fused feature to obtain a first mass fraction of each sample; the third calculating module is used for calculating a second quality score of the class center corresponding to each class according to the quality scores of the samples in each class in the input image; the model training module is used for training a neural network model through a loss function according to the first mass score of each sample, the second mass score of the class center corresponding to the class to which each sample belongs and the cosine value of the deviation angle of the class center corresponding to the class to which each sample and each sample belong; and the face recognition module is used for carrying out face recognition by using the trained neural network model.
In a third aspect of the disclosed embodiments, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the above method when executing the computer program.
In a fourth aspect of the embodiments of the present disclosure, a computer readable storage medium is provided, which stores a computer program, which when executed by a processor implements the steps of the above method.
Compared with the prior art, the embodiment of the disclosure has the following beneficial effects: according to the embodiment of the disclosure, the first feature, the second feature and the third feature of each sample in the input image are extracted through the neural network model, feature fusion processing is performed on the three features to obtain the fusion feature of each sample, and then the first mass score of each sample and the second mass score of the class center corresponding to each class are obtained through calculation; calculating the cosine value of the deviation angle of each sample and the class center corresponding to the class to which each sample belongs through a neural network model; training a neural network model through a loss function according to the first mass score of each sample, the second mass score of the class center corresponding to the class to which each sample belongs and the cosine value of the deviation angle of the class center corresponding to the class to which each sample and each sample belong; therefore, by adopting the technical means, the problem that image feature extraction and image feature quality estimation cannot be mutually promoted in the training of the face recognition model in the prior art can be solved, and the accuracy of the face recognition model for recognizing the face is further improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and those skilled in the art can also obtain other drawings according to the drawings without inventive work.
FIG. 1 is a scenario diagram of an application scenario of an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a face recognition method according to an embodiment of the present disclosure;
FIG. 3 is a schematic flow chart diagram of a method for face recognition using a trained neural network model according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a face recognition apparatus according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a face recognition module provided in the embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.
A face recognition method and apparatus according to an embodiment of the present disclosure will be described in detail below with reference to the accompanying drawings.
Fig. 1 is a scene schematic diagram of an application scenario of an embodiment of the present disclosure. The application scenario may include terminal devices 1, 2, and 3, server 4, and network 5.
The terminal devices 1, 2, and 3 may be hardware or software. When the terminal devices 1, 2 and 3 are hardware, they may be various electronic devices having a display screen and supporting communication with the server 4, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like; when the terminal devices 1, 2, and 3 are software, they may be installed in the electronic devices as above. The terminal devices 1, 2 and 3 may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not limited by the embodiments of the present disclosure. Further, the terminal devices 1, 2, and 3 may have various applications installed thereon, such as a data processing application, an instant communication tool, social platform software, a search-type application, a shopping-type application, and the like.
The server 4 may be a server providing various services, for example, a background server receiving a request sent by a terminal device establishing a communication connection with the server, and the background server may receive and analyze the request sent by the terminal device and generate a processing result. The server 4 may be one server, may also be a server cluster composed of several servers, or may also be a cloud computing service center, which is not limited in this disclosure.
The server 4 may be hardware or software. When the server 4 is hardware, it may be various electronic devices that provide various services to the terminal devices 1, 2, and 3. When the server 4 is software, it may be a plurality of software or software modules providing various services for the terminal devices 1, 2, and 3, or may be a single software or software module providing various services for the terminal devices 1, 2, and 3, which is not limited by the embodiment of the present disclosure.
The network 5 may be a wired network connected by a coaxial cable, a twisted pair and an optical fiber, or may be a wireless network capable of interconnecting various Communication devices without wiring, for example, Bluetooth (Bluetooth), Near Field Communication (NFC), Infrared (Infrared), and the like, which is not limited in this disclosure.
A user can establish a communication connection with the server 4 via the network 5 through the terminal devices 1, 2, and 3 to receive or transmit information or the like. It should be noted that the specific types, numbers and combinations of the terminal devices 1, 2 and 3, the server 4 and the network 5 may be adjusted according to the actual requirements of the application scenarios, and the embodiment of the present disclosure is not limited thereto.
Fig. 2 is a schematic flow chart of a face recognition method according to an embodiment of the present disclosure. The face recognition method of fig. 2 may be performed by the terminal device or the server of fig. 1. As shown in fig. 2, the face recognition method includes:
s201, acquiring an input image, extracting a first feature, a second feature and a third feature of each sample in the input image through a neural network model, and calculating a cosine value of a deviation angle between each sample and a class center corresponding to a class to which each sample belongs through the neural network model, wherein the input image has a plurality of classes, each class corresponds to one class center, and each class has a plurality of samples;
s202, performing feature fusion processing on the first feature, the second feature and the third feature of each sample to obtain fusion features of each sample;
s203, performing interactive calculation on at least one of the following characteristics of each sample: the first feature, the third feature and the fused feature to obtain a first mass fraction of each sample;
s204, calculating a second mass score of the class center corresponding to each class according to the mass scores of the samples in each class in the input image;
s205, training a neural network model through a loss function according to the first mass score of each sample, the second mass score of the class center corresponding to the class to which each sample belongs and the cosine value of the deviation angle of the class center corresponding to the class to which each sample and each sample belong;
and S206, performing face recognition by using the trained neural network model.
It should be noted that the input image obtained in the embodiment of the present disclosure is for training a neural network model, and the input image includes a plurality of classes, each class corresponds to one positive class center and/or one negative class center, and each class has a plurality of samples. In the field of face recognition, a class may be a person and a sample may be a picture of a person. The class center corresponding to one class can be understood as an average value of the characteristics of all pictures of one person, the positive class center corresponding to one class can be understood as an average value of the characteristics of positive sample pictures of one person, the negative class center corresponding to one class can be understood as an average value of the characteristics of negative sample pictures of one person, the sample with the characteristics of the sample larger than the preset threshold value for face recognition is a positive sample, and the sample with the characteristics of the sample smaller than the preset threshold value for face recognition is a negative sample.
According to the technical scheme provided by the embodiment of the disclosure, the first feature, the second feature and the third feature of each sample in the input image are extracted through the neural network model, feature fusion processing is carried out on the three features, the fusion feature of each sample is obtained, and then the first mass score of each sample and the second mass score of the class center corresponding to each class are obtained through calculation; calculating the cosine value of the deviation angle of each sample and the class center corresponding to the class to which each sample belongs through a neural network model; training a neural network model through a loss function according to the first mass score of each sample, the second mass score of the class center corresponding to the class to which each sample belongs and the cosine value of the deviation angle of the class center corresponding to the class to which each sample and each sample belong; therefore, by adopting the technical means, the problem that image feature extraction and image feature quality estimation cannot be mutually promoted in the training of the face recognition model in the prior art can be solved, and the accuracy of the face recognition model in face recognition is further improved.
In step S202, performing feature fusion processing on the first feature, the second feature, and the third feature of each sample to obtain a fusion feature of each sample, including: inputting an input image into a neural network model, and outputting a first characteristic, a second characteristic and a third characteristic of each sample in the input image through a second stage, a third stage and a fourth stage of the neural network model respectively, wherein the neural network model has four stages; inputting the first characteristic of each sample into a first preset convolution layer, and outputting a fourth characteristic of each sample; inputting the second characteristic of each sample into a second preset convolution layer, and outputting a fifth characteristic of each sample; inputting the third characteristic of each sample into a third preset convolution layer, and outputting a sixth characteristic of each sample; performing feature splicing processing on the fourth feature, the fifth feature and the sixth feature of each sample to obtain splicing features of each sample, wherein the feature fusion processing comprises feature splicing processing; and inputting the splicing characteristic of each sample into a fourth preset convolution layer, and outputting the fusion characteristic of each sample.
The neural network model has four stages as is well known to those skilled in the art and will not be described herein.
Specifically, the first preset convolution layer performs convolution with a convolution kernel of a first preset size, a downsampling step length of a first preset step length and a channel number of a first preset number on the input first feature to obtain a fourth feature with a matrix dimension of a first preset dimension; for example, the first feature is convolved with a convolution kernel of 3 × 3, a downsampling step size of 2, and a channel number of 128 to obtain a fourth feature, and the dimension of the matrix corresponding to the fourth feature or the fourth feature is (14,14, 128). The second preset convolution layer performs convolution with a convolution kernel of a second preset size and a second preset number of channels on the input second characteristic to obtain a fifth characteristic with a matrix dimension of a second preset dimension; for example, the convolution kernel of the second feature is 3 × 3 and the convolution of the channel 128 is performed to obtain a fifth feature, and the dimension of the fifth feature or the dimension of the matrix corresponding to the fifth feature is (14,14, 128). The third preset convolutional layer performs convolution with a convolution kernel of a third preset size and a channel number of a third preset number on the input third characteristic, and performs linear interpolation upsampling processing of a preset multiple on the convolution result to obtain a sixth characteristic with a matrix dimension of a third preset dimension; for example, the convolution kernel of the third feature is 3 × 3 and the convolution is performed through the channel 128, and then 2 times of linear interpolation upsampling is performed on the convolution result to obtain the sixth feature, where the dimension of the sixth feature or the dimension of the matrix corresponding to the sixth feature is (14, 128). Performing convolution kernel on the input fourth feature by using a fourth preset convolution layer to obtain a convolution characteristic with a matrix dimension of a fourth preset dimension and a channel number of a fourth preset number; for example, convolution calculation with a convolution kernel of 1 × 1 and a channel number of 128 is performed on the fourth feature to obtain a convolution calculation result, and the convolution calculation result is used as a fusion feature. It should be noted that, the fourth preset convolution layer may further perform normalization on the convolution calculation result, and the obtained normalization result is used as the fusion characteristic.
According to the technical scheme provided by the embodiment of the disclosure, the first feature, the second feature and the third feature are processed through different preset convolution layers respectively to obtain the fourth feature, the fifth feature and the sixth feature, and then the fusion feature is obtained according to the fourth feature, the fifth feature and the sixth feature, so that the similarity of the fusion feature with the first feature, the second feature and the third feature is improved.
In step S203, interactive calculation is performed on at least one of the following characteristics of each sample: the first feature, the third feature, and the fused feature to obtain a first mass fraction for each sample, comprising: and performing interactive calculation on the first characteristic and the fusion characteristic of each sample to obtain a shallow quality score of each sample, wherein the first quality score comprises the shallow quality score and comprises the following steps: inputting the first characteristic into a fifth preset convolution layer and outputting a seventh characteristic; respectively carrying out dimensionality flattening processing on a first matrix corresponding to the seventh feature and a second matrix corresponding to the fusion feature, and multiplying the first matrix subjected to the dimensionality flattening processing by the transpose of the second matrix subjected to the dimensionality flattening processing to obtain a first product; calculating the first product by using a normalized exponential function to obtain a first calculation result, and multiplying the first calculation result by a second matrix to obtain a second product; multiplying the second product by the first parameter matrix and then by the second parameter matrix to obtain a third product; and calculating the third product by using a sigmoid function to obtain a shallow layer quality score.
The features of the sample in the embodiment of the present disclosure may be a feature map, and since the feature map may correspond to a matrix, each feature in the embodiment of the present disclosure may be directly understood as a matrix for ease of understanding.
Specifically, the fifth preset convolution layer performs convolution with a convolution kernel of a fifth preset size and a channel number of a fifth preset number on the input first feature to obtain a seventh feature. For example, the convolution calculation with a convolution kernel of 1 × 1 and a channel number of 128 is performed on the fifth feature, so that the seventh feature is obtained. Then, the dimension of the seventh feature is flattened into a matrix A of (28x28, 128), and the dimension of the first matrix is flattened into a matrix B of (14x14, 128), wherein 14x14 is the size of the convolution kernel, and 128 is the number of channels; multiplying A by the transpose of B; carrying out a normalized exponential function softmax operation on a product result, namely a first product; multiplying the operation result by B; the result of this multiplication, i.e., the second product, is then circumscribed by the parameter matrix W11(128, 64) and the matrix W12(64, 1) obtaining a third product; calculating the third product by using sigmoid function to obtain the shallow layer qualityAnd (4) dividing. It should be noted that, after the sigmoid function is used to calculate the third product, the calculation result may be averaged, and the average value is used as the shallow layer mass score. External parameter matrix W11Is the first parameter matrix, the external parameter matrix W12Is the second parameter matrix, and the first parameter matrix and the second parameter matrix can be preset.
The above steps can be understood as the following formula:
Figure BDA0003359211480000091
Figure BDA0003359211480000092
for shallow quality division, T is the sign of the transpose operation. W11Is a first parameter matrix, W12Is the second parameter matrix and mean is the averaging function.
Because the shallow quality score is output by the second stage of the neural network model, the shallow quality score can be understood as an influence weight value of the sample on the class center corresponding to the sample class.
In step S203, interactive calculation is performed on at least one of the following characteristics of each sample: the first feature, the third feature and the fused feature to obtain a first mass fraction of each sample, comprising: and performing interactive calculation on the third characteristic and the fusion characteristic of each sample to obtain a middle-layer quality score of each sample, wherein the first quality score comprises the middle-layer quality score and comprises the following steps: inputting the third characteristic into a fifth preset convolution layer and outputting an eighth characteristic; respectively carrying out dimensionality flattening processing on a third matrix corresponding to the eighth feature and a second matrix corresponding to the fusion feature, and multiplying the third matrix subjected to the dimensionality flattening processing by the transpose of the second matrix subjected to the dimensionality flattening processing to obtain a fourth product; calculating the fourth product by using the normalized exponential function to obtain a second calculation result, and multiplying the second calculation result by the second matrix to obtain a fifth product; multiplying the fifth product by the third parameter matrix and then by the fourth parameter matrix to obtain a sixth product; and calculating the sixth product by using a sigmoid function to obtain the middle-layer quality score.
For example, the convolution calculation with a convolution kernel of 1 × 1 and a channel number of 128 is performed on the third feature, so that the eighth feature is obtained. Then, the dimension of the eighth feature is flattened into a matrix C of (7x7, 128), and the dimension of the first matrix is flattened into a matrix B of (14x14, 128), wherein 14x14 is the size of the convolution kernel, and 128 is the number of channels; c is multiplied by the transposition of B; carrying out a normalized exponential function softmax operation on the product result, namely the fourth product; multiplying the operation result by B; the result of this multiplication, i.e., the fifth product, is then circumscribed by the parameter matrix W21(128, 64) and the matrix W22(64, 1) obtaining a sixth product; and calculating the sixth product by using a sigmoid function to obtain the middle-layer quality score. It should be noted that after the sigmoid function is used to calculate the sixth product, the calculation result may be averaged, and the average value is used as the middle-layer mass fraction. External parameter matrix W21Is a third parameter matrix, an external parameter matrix W22Is the fourth parameter matrix, and the third parameter matrix and the fourth parameter matrix can be preset.
The above steps can be understood as the following formula:
Figure BDA0003359211480000101
Figure BDA0003359211480000102
is the mass fraction of the middle layer, W21Is a third parameter matrix, W22Is a fourth parameter matrix.
In step S203, interactive calculation is performed on at least one of the following characteristics of each sample: the first feature, the third feature, and the fused feature to obtain a first mass fraction for each sample, comprising: performing interactive calculation on the fusion characteristics of each sample to obtain a high-level quality score of each sample, wherein the first quality score comprises the high-level quality score and comprises the following steps: inputting the fusion feature into a fifth preset convolution layer and outputting a ninth feature; respectively carrying out dimensionality flattening processing on a fourth matrix corresponding to the ninth feature and a second matrix corresponding to the fusion feature, and multiplying the fourth matrix subjected to the dimensionality flattening processing by the transpose of the second matrix subjected to the dimensionality flattening processing to obtain a seventh product; calculating the seventh product by using the normalized exponential function to obtain a third calculation result, and multiplying the third calculation result by the second matrix to obtain an eighth product; multiplying the eighth product by a fifth parameter matrix and then by a sixth parameter matrix to obtain a ninth product; and calculating the ninth product by using a sigmoid function to obtain a high-level quality score.
For example, the fusion feature is subjected to convolution calculation with a convolution kernel of 1 × 1 and a channel number of 128, so as to obtain a ninth feature. Then, the ninth feature is flattened into a matrix D of (14x14, 128), and the first matrix is flattened into a matrix B of (14x14, 128), wherein 14x14 is the size of the convolution kernel, and 128 is the number of channels; d multiplied by the transpose of B; performing a normalization exponent function softmax operation on the product result, namely the seventh product; multiplying the operation result by B; the result of this multiplication, i.e. the eighth product, is then circumscribed by the parameter matrix W31(128, 64) and the matrix W32(64, 1) to obtain a ninth product; and calculating the ninth product by using a sigmoid function to obtain a high-level quality score. It should be noted that, after the sigmoid function is used to calculate the ninth product, the calculation result may be averaged, and the average value is used as the high-level quality score. External parameter matrix W31Is the fifth parameter matrix, the external parameter matrix W32Namely, the sixth parameter matrix, the fifth parameter matrix and the sixth parameter matrix may be preset.
The above steps can be understood as the following formula:
Figure BDA0003359211480000103
Figure BDA0003359211480000104
is a high-layer quality score, W31Is a third parameter matrix, W32Is a fourth parameter matrix.
In step S204, calculating a second quality score of the class center corresponding to each class according to the quality scores of the plurality of samples in each class in the input image, including calculating a first queue score corresponding to each sample according to the middle-layer quality score and the high-layer quality score of each sample, where the first quality score includes: superficial layer mass fraction, middle layer mass fraction and high layer mass fraction; calculating a second queue score corresponding to each sample according to the shallow layer mass score of each sample; and calculating a second quality score of the class center corresponding to each class according to the first queue score and the second queue score corresponding to the plurality of samples in each class.
Specifically, the first queue score q is calculated according to the following formulai,1
Figure BDA0003359211480000111
Calculating a second queue score q according to the following formulai,2
Figure BDA0003359211480000112
Calculating a second mass fraction gamma of the class center corresponding to each class according to the following formula:
γ according to
Figure BDA0003359211480000113
i is the number of samples in a class, R is the number of samples in a class, and α is the tuning parameter of the neural network model, which can be set to 0.2, max (R) indicates the number of samples in the class with the most samples in all classes.
In step S205, a neural network model is trained through a loss function according to the first quality score of each sample, the second quality score of the class center corresponding to the class to which each sample belongs, and the cosine value of the deviation angle of the class center corresponding to the class to which each sample and each sample belong, including: training a neural network model through a cross entropy loss function according to the first quality score of each sample and the cosine value of the deviation angle of the class center corresponding to the class to which each sample belongs, wherein the loss function comprises the cross entropy loss function; wherein, first quality divides, includes: the shallow layer quality is divided, the middle layer quality is divided and the high layer quality is divided; wherein, class center includes: a positive class center and a negative class center.
In particular, the cross entropy loss function L1Comprises the following steps:
Figure BDA0003359211480000121
s is an amplification factor, which can be set to 64, θyiRepresenting the angle of the sample from the center of the positive class, thetajRepresenting the angle of the sample from the center of the negative class, i being the number of positive samples in a class, j being the number of negative samples in a class, m0Can be 0.35 m1One can take 0.25, N the total number of all samples, N the total number of all negative samples.
In step S205, training a neural network model by a loss function according to the first quality score of each sample, the second quality score of the class center corresponding to the class to which each sample belongs, and the cosine value of the deviation angle of the class center corresponding to each sample and the class to which each sample belongs includes: training a neural network model with a neighbor optimization loss function, wherein the loss function comprises a neighbor optimization loss function comprising: calculating a first sum of the reciprocal of the superficial layer mass fraction, the reciprocal of the middle layer mass fraction and the reciprocal of the high layer mass fraction of each sample, wherein the first mass fraction includes: superficial layer mass fraction, middle layer mass fraction and high layer mass fraction; calculating a neighbor optimization result by using a neighbor optimization function according to the second quality score of the class center corresponding to the class to which each sample belongs and the cosine value of the deviation angle of the negative class center corresponding to the class to which each sample and each class to which each sample belongs, wherein the class center comprises: a positive class center and a negative class center; calculating a second sum of the shallow layer mass fraction, the middle layer mass fraction and the high layer mass fraction of each sample, and multiplying the second sum corresponding to each sample by a neighbor optimization result corresponding to each sample to obtain a tenth product; adding the first sum corresponding to each sample to the tenth product corresponding to each sample to obtain a third sum; a third and trained neural network model is mapped to each sample.
In particular, the neighbor optimization loss function L2Comprises the following steps:
Figure BDA0003359211480000122
t may be set to 0.2, ∑top10cos(θj- γ t) is a neighbor optimization result calculated for the neighbor optimization function, z is a preset optimization number, assuming z is 10, Σtop10cos(θj- γ t) indicates that in a class, the neural network model is optimized using the top 10 negative examples with the highest similarity to the negative class center corresponding to the class.
In an alternative embodiment, when the neural network model is trained, the cosine value cos θ of the deviation angle between each sample and the positive class center corresponding to the class to which each sample belongs needs to be updated or modifiedpThis step may update the cosine value cos θ by the following formulap
Figure BDA0003359211480000131
Or
Figure BDA0003359211480000132
Or
Figure BDA0003359211480000133
Or
Figure BDA0003359211480000134
m2And m1And m0As such, are parameters that may be preset.
Cosine value cos θpIs related to whether the neural network model determines the sample as a positive sample or a negative sample. And inputting a sample into the neural network model, wherein the neural network model can judge whether the sample is a positive sample or a negative sample.
Fig. 3 is a flowchart illustrating a method for performing face recognition using a trained neural network model according to an embodiment of the present disclosure. The face recognition method of fig. 3 may be performed by the terminal device or the server of fig. 1.
As shown in fig. 3, the face recognition method includes:
s301, when a target user is detected to enter a preset area, acquiring a face image of the target user through image acquisition equipment, and acquiring a face prototype graph corresponding to the face image from a face detection database;
s302, extracting a tenth feature and an eleventh feature corresponding to the face prototype graph and the face image respectively through a neural network model, and calculating a first score and a second score corresponding to the tenth feature and the eleventh feature respectively;
s303, calculating Euclidean distances between the tenth feature and the eleventh feature;
s304, calculating the Euclidean transformation distance according to the first fraction, the second fraction and the Euclidean distance;
s305, confirming that the face recognition is successful under the condition that the Euclidean transformation distance is larger than a preset threshold value.
According to the technical scheme provided by the embodiment of the disclosure, because the tenth feature and the eleventh feature corresponding to the face prototype graph and the face image can be respectively extracted through the neural network model, the first score and the second score corresponding to the tenth feature and the eleventh feature are respectively calculated, the euclidean transformation distance is further calculated, and the face recognition success is confirmed under the condition that the euclidean transformation distance is greater than the preset threshold value, by adopting the technical means, the problem that the threshold value of the face recognition is fixed in the face recognition in the prior art can be solved, and the face recognition scheme based on the dynamic threshold value is further provided.
Specifically, the euclidean distance is:
d' ═ D + euclidean transform(s)1,s2)
f1 and f2 represent the tenth feature and the eleventh feature, respectively, s1 and s2 represent the first score and the second score, respectively, D is the euclidean distance of f1 and f2, D' is the euclidean distance, β can be adjusted, and is generally 1.2, and f () can be a min () function.
As can be seen from the above formula, the euclidean transformation distance varies according to the face image of the target user acquired or detected by the image acquisition device, and therefore the threshold value of the face recognition in the embodiment of the present disclosure is dynamically varied.
All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described in detail herein.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.
Fig. 4 is a schematic diagram of a face recognition apparatus according to an embodiment of the present disclosure. As shown in fig. 4, the face recognition apparatus includes:
a first calculating module 401, configured to obtain an input image, extract a first feature, a second feature and a third feature of each sample in the input image through a neural network model, and calculate a cosine value of a deviation angle between each sample and a class center corresponding to a class to which each sample belongs through the neural network model, where the input image has a plurality of classes, each class corresponds to a class center, and each class has a plurality of samples;
a feature fusion module 402, configured to perform feature fusion processing on the first feature, the second feature, and the third feature of each sample, to obtain a fusion feature of each sample;
a second calculation module 403 configured to perform interactive calculation on at least one of the following characteristics of each sample: the first feature, the third feature and the fused feature to obtain a first mass fraction of each sample;
a third calculating module 404 configured to calculate a second quality score of a class center corresponding to each class according to the quality scores of the plurality of samples in each class in the input image;
a model training module 405 configured to train a neural network model through a loss function according to the first quality score of each sample, the second quality score of the class center corresponding to the class to which each sample belongs, and the cosine value of the deviation angle of the class center corresponding to each sample and the class to which each sample belongs;
a face recognition module 406 configured to perform face recognition using the trained neural network model.
It should be noted that the input image obtained in the embodiment of the present disclosure is for training a neural network model, and the input image includes a plurality of classes, each class corresponds to one positive class center and/or one negative class center, and each class has a plurality of samples. In the field of face recognition, a class may be a person and a sample may be a picture of a person. The class center corresponding to one class can be understood as an average value of the characteristics of all pictures of one person, the positive class center corresponding to one class can be understood as an average value of the characteristics of positive sample pictures of one person, the negative class center corresponding to one class can be understood as an average value of the characteristics of negative sample pictures of one person, the sample with the characteristics of the sample larger than the preset threshold value for face recognition is a positive sample, and the sample with the characteristics of the sample smaller than the preset threshold value for face recognition is a negative sample.
According to the technical scheme provided by the embodiment of the disclosure, the first feature, the second feature and the third feature of each sample in the input image are extracted through the neural network model, feature fusion processing is carried out on the three features, the fusion feature of each sample is obtained, and then the first mass score of each sample and the second mass score of the class center corresponding to each class are obtained through calculation; calculating the cosine value of the deviation angle of each sample and the class center corresponding to the class to which each sample belongs through a neural network model; training a neural network model through a loss function according to the first mass score of each sample, the second mass score of the class center corresponding to the class to which each sample belongs and the cosine value of the deviation angle of the class center corresponding to the class to which each sample and each sample belong; therefore, by adopting the technical means, the problem that image feature extraction and image feature quality estimation cannot be mutually promoted in the training of the face recognition model in the prior art can be solved, and the accuracy of the face recognition model in face recognition is further improved.
Optionally, the feature fusion module 402 is further configured to input the input image into a neural network model, and output the first feature, the second feature and the third feature of each sample in the input image through a second stage, a third stage and a fourth stage of the neural network model, respectively, wherein the neural network model has four stages; inputting the first characteristic of each sample into a first preset convolution layer, and outputting a fourth characteristic of each sample; inputting the second characteristic of each sample into a second preset convolution layer, and outputting a fifth characteristic of each sample; inputting the third characteristic of each sample into a third preset convolution layer, and outputting a sixth characteristic of each sample; performing feature splicing processing on the fourth feature, the fifth feature and the sixth feature of each sample to obtain splicing features of each sample, wherein the feature fusion processing comprises feature splicing processing; and inputting the splicing characteristic of each sample into a fourth preset convolution layer, and outputting the fusion characteristic of each sample.
The neural network model has four stages as is well known to those skilled in the art and will not be described herein.
Specifically, the first preset convolution layer performs convolution with a convolution kernel of a first preset size, a downsampling step length of a first preset step length and a channel number of a first preset number on the input first feature to obtain a fourth feature with a matrix dimension of a first preset dimension; for example, the first feature is convolved with a convolution kernel of 3 × 3, a down-sampling step size of 2, and a channel number of 128, so as to obtain a fourth feature, where the fourth feature or the dimension of the matrix corresponding to the fourth feature is (14,14, 128). The second preset convolution layer performs convolution with a convolution kernel of a second preset size and a second preset number of channels on the input second characteristic to obtain a fifth characteristic with a matrix dimension of a second preset dimension; for example, the convolution kernel of the second feature is 3 × 3 and the convolution of the channel 128 is performed to obtain a fifth feature, and the dimension of the fifth feature or the dimension of the matrix corresponding to the fifth feature is (14,14, 128). The third preset convolutional layer performs convolution with a convolution kernel of a third preset size and a channel number of a third preset number on the input third characteristic, and performs linear interpolation upsampling processing of a preset multiple on the convolution result to obtain a sixth characteristic with a matrix dimension of a third preset dimension; for example, the convolution kernel of the third feature is 3 × 3 and the channel 128 are convolved, and then the convolution result is upsampled by 2 times of linear interpolation to obtain a sixth feature, where the dimension of the sixth feature or the dimension of the matrix corresponding to the sixth feature is (14, 128). Performing convolution kernel on the input fourth feature by using a fourth preset convolution layer to obtain a convolution characteristic with a matrix dimension of a fourth preset dimension and a channel number of a fourth preset number; for example, convolution calculation with a convolution kernel of 1 × 1 and a channel number of 128 is performed on the fourth feature to obtain a convolution calculation result, and the convolution calculation result is used as a fusion feature. It should be noted that, the fourth preset convolution layer may further perform normalization processing on the convolution calculation result, and the obtained normalization processing result is used as the fusion characteristic.
According to the technical scheme provided by the embodiment of the disclosure, the first feature, the second feature and the third feature are processed through different preset convolution layers respectively to obtain the fourth feature, the fifth feature and the sixth feature, and then the fusion feature is obtained according to the fourth feature, the fifth feature and the sixth feature, so that the similarity of the fusion feature with the first feature, the second feature and the third feature is improved.
Optionally, the second calculating module 403 is further configured to perform an interactive calculation on the first feature and the fused feature of each sample to obtain a shallow quality score of each sample, where the first quality score includes the shallow quality score, and the method includes: inputting the first characteristic into a fifth preset convolution layer and outputting a seventh characteristic; respectively carrying out dimensionality flattening processing on the first matrix corresponding to the seventh characteristic and the second matrix corresponding to the fusion characteristic, and multiplying the first matrix subjected to dimensionality flattening processing by the transpose of the second matrix subjected to dimensionality flattening processing to obtain a first product; calculating the first product by using a normalized exponential function to obtain a first calculation result, and multiplying the first calculation result by a second matrix to obtain a second product; multiplying the second product by the first parameter matrix and then by the second parameter matrix to obtain a third product; and calculating the third product by using a sigmoid function to obtain a shallow layer mass score.
The features of the sample in the embodiment of the present disclosure may be a feature map, and since the feature map may correspond to a matrix, each feature in the embodiment of the present disclosure may be directly understood as a matrix for ease of understanding.
Specifically, the fifth preset convolution layer performs convolution with a convolution kernel of a fifth preset size and a number of channels of a fifth preset number on the input first feature to obtain a seventh feature. For example, the convolution calculation with a convolution kernel of 1 × 1 and a channel number of 128 is performed on the fifth feature, so that the seventh feature is obtained. Then, the dimension of the seventh feature is flattened into a matrix A of (28x28, 128), and the dimension of the first matrix is flattened into a matrix B of (14x14, 128), wherein 14x14 is the size of the convolution kernel, and 128 is the number of channels; multiplying A by the transpose of B; carrying out a normalized exponential function softmax operation on a product result, namely a first product; multiplying the operation result by B; the result of this multiplication, i.e., the second product, is then circumscribed by the parameter matrix W11(128, 64) and the matrix W12(64, 1) obtaining a third product; and calculating the third product by using a sigmoid function to obtain a shallow layer mass score. It should be noted that, after the sigmoid function is used to calculate the third product, the calculation result may be averaged, and the average value is used as the shallow layer mass score.
The above steps can be understood as the following formula:
Figure BDA0003359211480000171
Figure BDA0003359211480000172
for shallow quality scores, T is the sign of the transpose operation. W11Is a first parameter matrix, W12Is the second parameter matrix and mean is the averaging function.
Because the shallow quality score is output by the second stage of the neural network model, the shallow quality score can be understood as an influence weight value of the sample on the class center corresponding to the sample class.
Optionally, the second calculating module 403 is further configured to perform an interactive calculation on the third feature and the fused feature of each sample to obtain a middle-layer quality score of each sample, where the first quality score includes the middle-layer quality score, and the method includes: inputting the third characteristic into a fifth preset convolution layer and outputting an eighth characteristic; respectively carrying out dimensionality flattening processing on a third matrix corresponding to the eighth feature and a second matrix corresponding to the fusion feature, and multiplying the third matrix subjected to dimensionality flattening processing by the transpose of the second matrix subjected to dimensionality flattening processing to obtain a fourth product; calculating the fourth product by using the normalized exponential function to obtain a second calculation result, and multiplying the second calculation result by the second matrix to obtain a fifth product; multiplying the fifth product by the third parameter matrix and then by the fourth parameter matrix to obtain a sixth product; and calculating the sixth product by using a sigmoid function to obtain the middle-layer quality score.
For example, the convolution calculation with a convolution kernel of 1 × 1 and a channel number of 128 is performed on the third feature, so that the eighth feature is obtained. Then, the dimension of the eighth feature is flattened into a matrix C of (7x7, 128), and the dimension of the first matrix is flattened into a matrix B of (14x14, 128), wherein 14x14 is the size of the convolution kernel, and 128 is the number of channels; c is multiplied by the transposition of B; carrying out a normalized exponential function softmax operation on the product result, namely the fourth product; multiplying the operation result by B; the result of this multiplication, i.e., the fifth product, is then circumscribed by the parameter matrix W21(128, 64) and the matrix W22(64, 1) obtaining a sixth product; and calculating the sixth product by using a sigmoid function to obtain the middle-layer quality score. It is to be noted thatAfter the sigmoid function is used to calculate the sixth product, the calculation result may be averaged, and the average value may be used as the middle-layer mass fraction.
The above steps can be understood as the following formula:
Figure BDA0003359211480000181
Figure BDA0003359211480000182
is the mass fraction of the middle layer, W21Is a third parameter matrix, W22Is a fourth parameter matrix.
Optionally, the second calculating module 403 is further configured to perform an interactive calculation on the fusion feature of each sample to obtain a high-level quality score of each sample, where the first quality score includes the high-level quality score, and includes: inputting the fusion feature into a fifth preset convolution layer and outputting a ninth feature; respectively carrying out dimensionality flattening processing on a fourth matrix corresponding to the ninth feature and a second matrix corresponding to the fusion feature, and multiplying the fourth matrix subjected to dimensionality flattening processing by the transpose of the second matrix subjected to dimensionality flattening processing to obtain a seventh product; calculating the seventh product by using the normalized exponential function to obtain a third calculation result, and multiplying the third calculation result by the second matrix to obtain an eighth product; multiplying the eighth product by the fifth parameter matrix and then by the sixth parameter matrix to obtain a ninth product; and calculating the ninth product by using a sigmoid function to obtain a high-layer quality score.
For example, the fusion feature is subjected to convolution calculation with a convolution kernel of 1 × 1 and a channel number of 128, so as to obtain a ninth feature. Then, the ninth feature is flattened into a matrix D of (14x14, 128), and the first matrix is flattened into a matrix B of (14x14, 128), wherein 14x14 is the size of the convolution kernel, and 128 is the number of channels; d is multiplied by the transposition of B; performing a normalization exponent function softmax operation on the product result, namely the seventh product; multiplying the operation result by B; the result of this multiplication, i.e. the eighth multiplication, is then usedExternal parameter matrix W31(128, 64) and the matrix W32(64, 1), obtaining a ninth product; and calculating the ninth product by using a sigmoid function to obtain a high-level quality score. It should be noted that, after the sigmoid function is used to calculate the ninth product, the calculation result may be averaged, and the average value is used as the high-level quality score.
The above steps can be understood as the following formula:
Figure BDA0003359211480000191
Figure BDA0003359211480000192
is a high-layer quality score, W31Is a third parameter matrix, W32Is a fourth parameter matrix.
Optionally, the third calculating module 404 is further configured to calculate a first queue score corresponding to each sample according to the middle-layer quality score and the high-layer quality score of each sample, where the first quality score includes: a superficial layer mass fraction, a middle layer mass fraction and a high layer mass fraction; calculating a second queue score corresponding to each sample according to the shallow layer mass score of each sample; and calculating a second quality score of the class center corresponding to each class according to the first queue score and the second queue score corresponding to the plurality of samples in each class.
Specifically, the first queue score q is calculated according to the following formulai,1
Figure BDA0003359211480000193
Calculating a second queue score q according to the following formulai,2
qi,2=norm(φ1)
Calculating a second mass fraction gamma of the class center corresponding to each class according to the following formula:
Figure BDA0003359211480000201
i is the number of samples in a class, R is the number of samples in a class, and α is the tuning parameter of the neural network model, which can be set to 0.2, max (R) indicates the number of samples in the class with the most samples in all classes.
Optionally, the model training module 405 is further configured to train the neural network model through a cross entropy loss function according to the first quality score of each sample and a cosine value of a deviation angle of a class center corresponding to the class to which each sample belongs, wherein the loss function includes the cross entropy loss function; wherein, first quality divides, includes: superficial layer mass fraction, middle layer mass fraction and high layer mass fraction; wherein, class center includes: a positive class center and a negative class center.
In particular, the cross entropy loss function L1Comprises the following steps:
Figure BDA0003359211480000202
s is an amplification factor, which can be set to 64, θyiRepresenting the angle of the sample from the center of the positive class, thetajRepresenting the angle of the sample from the center of the negative class, i being the number of positive samples in a class, j being the number of negative samples in a class, m0Can be 0.35 m1One can take 0.25, N the total number of all samples, N the total number of all negative samples.
Optionally, the model training module 405 is further configured to calculate a first sum of an inverse of the superficial mass score, an inverse of the middle mass score and an inverse of the high mass score for each sample, wherein the first mass score comprises: superficial layer mass fraction, middle layer mass fraction and high layer mass fraction; calculating a neighbor optimization result by using a neighbor optimization function according to the second quality score of the class center corresponding to the class to which each sample belongs and the cosine value of the deviation angle of the negative class center corresponding to the class to which each sample and each class belong, wherein the class center comprises: a positive class center and a negative class center; calculating a second sum of the shallow layer mass fraction, the middle layer mass fraction and the high layer mass fraction of each sample, and multiplying the second sum corresponding to each sample by a neighbor optimization result corresponding to each sample to obtain a tenth product; adding the first sum corresponding to each sample to the tenth product corresponding to each sample to obtain a third sum; a third and trained neural network model is mapped to each sample.
In particular, the neighbor optimization loss function L2Comprises the following steps:
Figure BDA0003359211480000211
t may be set to 0.2, ∑top10cos(θj- γ t) is a neighbor optimization result calculated for the neighbor optimization function, z is a preset optimization number, assuming z is 10, Σtop10cos(θj- γ t) indicates that in a class, the neural network model is optimized using the top 10 negative examples with the highest similarity to the negative class center corresponding to the class.
Optionally, the model training module 405 is further configured to update or modify the cosine value cos θ of the deviation angle between each sample and the positive class center corresponding to the class to which each sample belongs when the neural network model is trainedpThis step may update the cosine value cos θ by the following formulap
Figure BDA0003359211480000212
Or
Figure BDA0003359211480000213
Or
Figure BDA0003359211480000214
Or
Figure BDA0003359211480000215
m2And m1And m0As such, are parameters that may be preset.
Cosine value cos θpIs related to whether the neural network model decides the sample as a positive sample or a negative sample. One sample is input into the neural network model, and the neural network model can automatically judge whether the sample is a positive sample or a negative sample.
Fig. 5 is a schematic structural diagram of a face recognition module according to an embodiment of the present disclosure. As shown in fig. 5, the face recognition module includes:
the detection unit 501 is configured to, when it is detected that a target user enters a preset area, acquire a face image of the target user through an image acquisition device, and acquire a face prototype diagram corresponding to the face image from a face detection database;
a first calculating unit 502 configured to extract a tenth feature and an eleventh feature corresponding to the face prototype graph and the face image, respectively, through a neural network model, and calculate a first score and a second score corresponding to the tenth feature and the eleventh feature, respectively;
a second calculation unit 503 configured to calculate euclidean distances of the tenth feature and the eleventh feature;
a third calculation unit 504 configured to calculate a euclidean distance from the first score, the second score, and the euclidean distance;
a confirming unit 505 configured to confirm that the face recognition is successful in a case where the euclidean transformation distance is greater than a preset threshold.
According to the technical scheme provided by the embodiment of the disclosure, because the tenth feature and the eleventh feature corresponding to the face prototype graph and the face image can be respectively extracted through the neural network model, the first score and the second score corresponding to the tenth feature and the eleventh feature are respectively calculated, the euclidean transformation distance is further calculated, and the face recognition success is confirmed under the condition that the euclidean transformation distance is greater than the preset threshold value, by adopting the technical means, the problem that the threshold value of the face recognition is fixed in the face recognition in the prior art can be solved, and the face recognition scheme based on the dynamic threshold value is further provided.
Specifically, the euclidean distance is:
d' ═ D + euclidean transform(s)1,s2)
f1 and f2 represent the tenth feature and the eleventh feature, respectively, s1 and s2 represent the first score and the second score, respectively, D is the euclidean distance of f1 and f2, D' is the euclidean distance, β can be adjusted, and is generally 1.2, and f can be a min () function.
As can be seen from the above formula, the euclidean distance varies according to the face image of the target user acquired or detected by the image acquisition device, and therefore the threshold value of the face recognition in the embodiment of the present disclosure is dynamically varied.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure.
Fig. 6 is a schematic diagram of an electronic device 6 provided by an embodiment of the present disclosure. As shown in fig. 6, the electronic apparatus 6 of this embodiment includes: a processor 601, a memory 602, and a computer program 603 stored in the memory 602 and executable on the processor 601. The steps in the various method embodiments described above are implemented when the computer program 603 is executed by the processor 601. Alternatively, the processor 601, when executing the computer program 603, implements the functions of the modules/units in the above-described apparatus embodiments.
Illustratively, the computer program 603 may be partitioned into one or more modules/units, which are stored in the memory 602 and executed by the processor 601 to accomplish the present disclosure. One or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 603 in the electronic device 6.
The electronic device 6 may be a desktop computer, a notebook, a palm computer, a cloud server, or other electronic devices. The electronic device 6 may include, but is not limited to, a processor 601 and a memory 602. Those skilled in the art will appreciate that fig. 6 is merely an example of an electronic device 6, and does not constitute a limitation of the electronic device 6, and may include more or less components than those shown, or combine certain components, or different components, for example, the electronic device may also include input-output devices, network access devices, buses, and the like.
The Processor 601 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 602 may be an internal storage unit of the electronic device 6, for example, a hard disk or a memory of the electronic device 6. The memory 602 may also be an external storage device of the electronic device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device 6. Further, the memory 602 may also include both an internal storage unit of the electronic device 6 and an external storage device. The memory 602 is used for storing computer programs and other programs and data required by the electronic device. The memory 602 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned functional units and modules are illustrated as examples, and in practical applications, the above-mentioned functions may be distributed as required to different functional units and modules, that is, the internal structure of the device may be divided into different functional units or modules to complete all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described or recited in detail in a certain embodiment, reference may be made to the descriptions of other embodiments.
Those of ordinary skill in the art would appreciate that the elements and algorithm steps of the various embodiments described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
In the embodiments provided in the present disclosure, it should be understood that the disclosed apparatus/electronic device and method may be implemented in other ways. For example, the above-described apparatus/electronic device embodiments are merely illustrative, and for example, a division of modules or units is merely a logical division, and there may be other divisions in an actual implementation, multiple units or components may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method in the above embodiments, and may also be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of the above methods and embodiments. The computer program may comprise computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
The above examples are only intended to illustrate the technical solutions of the present disclosure, not to limit them; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present disclosure, and are intended to be included within the scope of the present disclosure.

Claims (10)

1. A face recognition method, comprising:
acquiring an input image, extracting a first feature, a second feature and a third feature of each sample in the input image through a neural network model, and calculating a cosine value of a deviation angle between each sample and a class center corresponding to a class to which each sample belongs through the neural network model, wherein the input image has a plurality of classes, each class corresponds to one class center, and each class has a plurality of samples;
performing feature fusion processing on the first feature, the second feature and the third feature of each sample to obtain a fusion feature of each sample;
performing interactive calculation on at least one characteristic of each sample, wherein the characteristic comprises the following characteristics: the first feature, the third feature, and the fused feature to obtain a first mass fraction of the each sample;
calculating a second quality score of the class center corresponding to each class according to the first quality scores of the samples in each class in the input image;
training the neural network model through a loss function according to the first mass score of each sample, the second mass score of a class center corresponding to a class to which each sample belongs, and the cosine values of the deviation angles of the class centers corresponding to the class to which each sample and each sample belong;
and performing face recognition by using the trained neural network model.
2. The method according to claim 1, wherein the performing a feature fusion process on the first feature, the second feature and the third feature of each sample to obtain a fused feature of each sample comprises:
inputting the input image into the neural network model, and outputting the first feature, the second feature and the third feature of each sample in the input image through a second stage, a third stage and a fourth stage of the neural network model respectively, wherein the neural network model has four stages;
inputting the first characteristic of each sample into a first preset convolution layer, and outputting a fourth characteristic of each sample;
inputting the second characteristic of each sample into a second preset convolution layer, and outputting a fifth characteristic of each sample;
inputting the third feature of each sample into a third preset convolution layer, and outputting a sixth feature of each sample;
performing feature splicing processing on the fourth feature, the fifth feature and the sixth feature of each sample to obtain a spliced feature of each sample, wherein the feature fusion processing comprises the feature splicing processing;
inputting the splicing feature of each sample into a fourth preset convolution layer, and outputting the fusion feature of each sample.
3. The method of claim 1, wherein the interactive computation is performed on at least one of the following characteristics of each sample: the first feature, the third feature, and the fused feature to obtain a first mass fraction of the each sample, comprising:
performing the interactive calculation on the first feature and the fused feature of each sample to obtain a shallow mass fraction of each sample, wherein the first mass fraction includes the shallow mass fraction, and the method includes:
inputting the first characteristic into a fifth preset convolution layer and outputting a seventh characteristic;
respectively carrying out dimensionality leveling processing on a first matrix corresponding to the seventh feature and a second matrix corresponding to the fusion feature, and multiplying the first matrix subjected to the dimensionality leveling processing by the transposition of the second matrix subjected to the dimensionality leveling processing to obtain a first product;
calculating the first product by using a normalized exponential function to obtain a first calculation result, and multiplying the first calculation result by the second matrix to obtain a second product;
multiplying the second product by the first parameter matrix and then by the second parameter matrix to obtain a third product;
and calculating the third product by using a sigmoid function to obtain the shallow layer mass score.
4. The method of claim 1, wherein the interactive computation is performed on at least one of the following characteristics of each sample: the first feature, the third feature, and the fused feature to obtain a first mass fraction of the each sample, comprising:
performing the interactive calculation on the third feature and the fusion feature of each sample to obtain a middle-layer quality score of each sample, wherein the first quality score includes the middle-layer quality score, and the method includes:
inputting the third characteristic into a fifth preset convolution layer and outputting an eighth characteristic;
respectively carrying out dimensionality leveling processing on a third matrix corresponding to the eighth feature and a second matrix corresponding to the fusion feature, and multiplying the third matrix subjected to the dimensionality leveling processing by the transpose of the second matrix subjected to the dimensionality leveling processing to obtain a fourth product;
calculating the fourth product by using a normalized exponential function to obtain a second calculation result, and multiplying the second calculation result by the second matrix to obtain a fifth product;
multiplying the fifth product by a third parameter matrix and then by a fourth parameter matrix to obtain a sixth product;
and calculating the sixth product by using a sigmoid function to obtain the middle-layer quality score.
5. The method of claim 1, wherein the interactive computation is performed on at least one of the following characteristics of each sample: the first feature, the third feature, and the fused feature to obtain a first mass fraction of the each sample, comprising:
performing the interactive calculation on the fusion features of each sample to obtain a high-level quality score of each sample, wherein the first quality score comprises the high-level quality score, and the method comprises the following steps:
inputting the fusion feature into a fifth preset convolution layer and outputting a ninth feature;
respectively carrying out dimensionality leveling processing on a fourth matrix corresponding to the ninth feature and a second matrix corresponding to the fusion feature, and multiplying the fourth matrix subjected to the dimensionality leveling processing by the transpose of the second matrix subjected to the dimensionality leveling processing to obtain a seventh product;
calculating the seventh product by using a normalized exponential function to obtain a third calculation result, and multiplying the third calculation result by the second matrix to obtain an eighth product;
multiplying the eighth product by the fifth parameter matrix and then by the sixth parameter matrix to obtain a ninth product;
and calculating the ninth product by using a sigmoid function to obtain the high-level quality score.
6. The method of claim 1, wherein calculating a second quality score for the class center corresponding to each class from first quality scores of a plurality of samples in the input image in the each class comprises:
calculating a first queue score corresponding to each sample according to the middle-layer quality score and the high-layer quality score of each sample, wherein the first quality score comprises: a superficial layer mass fraction, the middle layer mass fraction and the high layer mass fraction;
calculating a second queue score corresponding to each sample according to the shallow mass score of each sample;
calculating the second quality score of the class center corresponding to each class according to the first queue score and the second queue score corresponding to a plurality of samples in each class.
7. The method of claim 1, wherein the training the neural network model with a loss function according to the first quality score of the each sample, the second quality score of a class center corresponding to a class to which the each sample belongs, and the cosine values of the deviation angles of the class centers of the each sample and the class to which the each sample belongs comprises:
training the neural network model through a cross entropy loss function according to the first quality score of each sample and the cosine value of the deviation angle of the class center corresponding to the class to which each sample and each sample belong, wherein the loss function comprises the cross entropy loss function;
wherein the first mass fraction includes: superficial layer mass fraction, middle layer mass fraction and high layer mass fraction;
wherein, the class center comprises: a positive class center and a negative class center.
8. The method of claim 1, wherein the training the neural network model with a loss function according to the first mass score of the each sample, the second mass score of the class center corresponding to the class to which the each sample belongs, and the cosine values of the deviation angles of the each sample and the class center corresponding to the class to which the each sample belongs comprises:
training the neural network model with a neighbor optimization loss function, wherein the loss function comprises the neighbor optimization loss function, comprising:
calculating a first sum of an inverse of the superficial mass fraction, an inverse of the medial mass fraction, and an inverse of the high mass fraction for each sample, wherein the first mass fraction comprises: superficial layer mass fraction, middle layer mass fraction and high layer mass fraction;
calculating a neighbor optimization result using a neighbor optimization function according to the second quality score of the class center corresponding to the class to which the each sample belongs and the cosine value of the deviation angle of the negative class center corresponding to the class to which the each sample and the class to which the each sample belongs, wherein the class center comprises: a positive class center and a negative class center;
calculating a second sum of the shallow layer quality score, the middle layer quality score and the high layer quality score of each sample, and multiplying the second sum corresponding to each sample by the neighbor optimization result corresponding to each sample to obtain a tenth product;
adding the first sum corresponding to each sample to the tenth product corresponding to each sample to obtain a third sum;
and training the neural network model by corresponding to the third sum through each sample.
9. The method of claim 1, wherein the using the trained neural network model for face recognition comprises:
when a target user is detected to enter a preset area, acquiring a face image of the target user through image acquisition equipment, and acquiring a face prototype graph corresponding to the face image from a face detection database;
respectively extracting tenth features and eleventh features corresponding to the face prototype graph and the face image through the neural network model, and respectively calculating first scores and second scores corresponding to the tenth features and the eleventh features;
calculating Euclidean distances of the tenth feature and the eleventh feature;
calculating the Euclidean transformation distance according to the first fraction, the second fraction and the Euclidean distance;
and confirming that the face recognition is successful under the condition that the Euclidean transformation distance is greater than a preset threshold value.
10. A face recognition apparatus, comprising:
the first calculation module is configured to acquire an input image, extract a first feature, a second feature and a third feature of each sample in the input image through a neural network model, and calculate a cosine value of a deviation angle of a class center corresponding to each sample and a class to which each sample belongs through the neural network model, wherein the input image has a plurality of classes, each class corresponds to a class center, and each class has a plurality of samples;
a feature fusion module configured to perform feature fusion processing on the first feature, the second feature and the third feature of each sample to obtain a fused feature of each sample;
a second computing module configured to interactively compute at least one of the following characteristics of each sample: the first feature, the third feature, and the fused feature to obtain a first mass fraction of the each sample;
a third calculating module configured to calculate a second quality score of the class center corresponding to each class according to the first quality scores of the plurality of samples in each class in the input image;
a model training module configured to train the neural network model by a loss function according to the first mass score of the each sample, the second mass score of a class center corresponding to a class to which the each sample belongs, and the cosine value of the deviation angle of the each sample and the class center corresponding to the class to which the each sample belongs;
and the face recognition module is configured to perform face recognition by using the trained neural network model.
CN202111360868.9A 2021-11-17 2021-11-17 Face recognition method and device Pending CN114708625A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111360868.9A CN114708625A (en) 2021-11-17 2021-11-17 Face recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111360868.9A CN114708625A (en) 2021-11-17 2021-11-17 Face recognition method and device

Publications (1)

Publication Number Publication Date
CN114708625A true CN114708625A (en) 2022-07-05

Family

ID=82166401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111360868.9A Pending CN114708625A (en) 2021-11-17 2021-11-17 Face recognition method and device

Country Status (1)

Country Link
CN (1) CN114708625A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108508294A (en) * 2018-03-29 2018-09-07 深圳众厉电力科技有限公司 A kind of high ferro electric energy quality monitoring system
CN109214360A (en) * 2018-10-15 2019-01-15 北京亮亮视野科技有限公司 A kind of construction method of the human face recognition model based on ParaSoftMax loss function and application
CN110796100A (en) * 2019-10-31 2020-02-14 浙江大华技术股份有限公司 Gait recognition method and device, terminal and storage device
CN113240092A (en) * 2021-05-31 2021-08-10 深圳市商汤科技有限公司 Neural network training and face recognition method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108508294A (en) * 2018-03-29 2018-09-07 深圳众厉电力科技有限公司 A kind of high ferro electric energy quality monitoring system
CN109214360A (en) * 2018-10-15 2019-01-15 北京亮亮视野科技有限公司 A kind of construction method of the human face recognition model based on ParaSoftMax loss function and application
CN110796100A (en) * 2019-10-31 2020-02-14 浙江大华技术股份有限公司 Gait recognition method and device, terminal and storage device
CN113240092A (en) * 2021-05-31 2021-08-10 深圳市商汤科技有限公司 Neural network training and face recognition method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US20230081645A1 (en) Detecting forged facial images using frequency domain information and local correlation
US10719693B2 (en) Method and apparatus for outputting information of object relationship
CN111369427B (en) Image processing method, image processing device, readable medium and electronic equipment
CN109829432B (en) Method and apparatus for generating information
CN111950723A (en) Neural network model training method, image processing method, device and terminal equipment
CN109902763B (en) Method and device for generating feature map
CN111340077A (en) Disparity map acquisition method and device based on attention mechanism
CN111950570B (en) Target image extraction method, neural network training method and device
CN112766284B (en) Image recognition method and device, storage medium and electronic equipment
EP4432215A1 (en) Image processing method and device
CN114330565A (en) Face recognition method and device
CN112614110A (en) Method and device for evaluating image quality and terminal equipment
CN111046911A (en) Image processing method and device
CN108921792B (en) Method and device for processing pictures
CN114332993A (en) Face recognition method and device, electronic equipment and computer readable storage medium
CN114529750A (en) Image classification method, device, equipment and storage medium
CN114049491A (en) Fingerprint segmentation model training method, fingerprint segmentation device, fingerprint segmentation equipment and fingerprint segmentation medium
CN111915689B (en) Method, apparatus, electronic device, and computer-readable medium for generating an objective function
CN109919249B (en) Method and device for generating feature map
CN111860557B (en) Image processing method and device, electronic equipment and computer storage medium
CN110895699B (en) Method and apparatus for processing feature points of image
CN110765304A (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN113139490B (en) Image feature matching method and device, computer equipment and storage medium
CN115953803A (en) Training method and device for human body recognition model
CN115909009A (en) Image recognition method, image recognition device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230109

Address after: 518054 cable information transmission building 25f2504, no.3369 Binhai Avenue, Haizhu community, Yuehai street, Nanshan District, Shenzhen City, Guangdong Province

Applicant after: Shenzhen Xumi yuntu Space Technology Co.,Ltd.

Address before: No.103, no.1003, Nanxin Road, Nanshan community, Nanshan street, Nanshan District, Shenzhen City, Guangdong Province

Applicant before: Shenzhen Jizhi Digital Technology Co.,Ltd.

TA01 Transfer of patent application right