CN113255575A

CN113255575A - Neural network training method and device, computer equipment and storage medium

Info

Publication number: CN113255575A
Application number: CN202110670976.XA
Authority: CN
Inventors: 胡琨; 于志鹏; 苗慕星; 吴一超; 梁鼎
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2021-08-13
Anticipated expiration: 2041-06-17
Also published as: CN113255575B; WO2022262209A1

Abstract

The present disclosure provides a neural network training method, apparatus, computer device, and storage medium, wherein the method comprises: acquiring a plurality of groups of sample images; each group of sample images comprises a reference image corresponding to a first object and an interference image corresponding to a second object, wherein the interference image fuses partial characteristics of the first object in the reference image; respectively inputting each sample image in the multiple groups of sample images into a neural network to be trained, and determining a feature vector corresponding to each sample image; determining feature similarity between a reference image and an interference image in the same group of sample images based on the feature vector; and determining a loss value of the training based on the feature similarity corresponding to each group of sample images, and training the neural network to be trained based on the loss value.

Description

Neural network training method and device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a neural network training method, an apparatus, a computer device, and a storage medium.

Background

With the progress of technology, there are many attack methods for face recognition, and the purpose of the attack is to disguise another identity through a face recognition system, for example, to unlock the mobile phone of another person, to enter a park through an entrance guard with the identity of another person, and the like. The attack method comprises wearing a specific identity mask, wearing a simulation headgear and the like, and for the method, the attack of the attack method can be well resisted by using a living body detection technology.

However, in the existing living body detection method, when facing an attack method that a person is shielded by a local face area, the person is likely to be identified as the person passing through the living body detection, thereby increasing the security risk of the existing face identification system.

Disclosure of Invention

The embodiment of the disclosure at least provides a neural network training method, a neural network training device, computer equipment and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a neural network training method, including:

acquiring a plurality of groups of sample images; each group of sample images comprises a reference image corresponding to a first object and an interference image corresponding to a second object, wherein the interference image fuses partial characteristics of the first object in the reference image;

respectively inputting each sample image in the multiple groups of sample images into a neural network to be trained, and determining a feature vector corresponding to each sample image;

determining feature similarity between a reference image and an interference image in the same group of sample images based on the feature vector;

and determining a loss value of the training based on the feature similarity corresponding to each group of sample images, and training the neural network to be trained based on the loss value.

In this way, the images fused with the partial features of the first object in the reference images are used as interference images, and the corresponding reference images are used as sample image groups to train the neural network, so that the samples used in the training of the neural network can better simulate face images which may appear in the scene of interference glasses attack; in the process of training the neural network, the loss value of the neural network is determined based on the feature similarity between the reference image and the interference image, and the higher the feature similarity between the reference image and the interference image is, the larger the loss value of the neural network is, so that the resolving power of the neural network for the reference image and the interference image can be trained, and the identification accuracy of the neural network in the scene of the interference glasses is enhanced.

In one possible embodiment, the determining the loss value of the current training based on the feature similarity corresponding to each group of sample images includes:

and determining a target loss function corresponding to the feature similarity based on the numerical relationship between the feature similarity and a preset loss parameter, and obtaining a loss value of the training based on the target loss function, wherein the preset loss parameter is used for expressing the capability of the neural network for distinguishing the reference image from the interference image.

Here, different numerical relationships between the feature similarity and the loss functions correspond to different loss functions, and the network accuracy of the neural network can be ensured while the convergence of the neural network is accelerated.

In a possible implementation, in the case that the feature similarity is greater than the preset loss parameter, the loss value is proportional to the feature similarity.

Thus, under the condition that the feature similarity is greater than the preset loss parameter, the loss value is in direct proportion to the feature similarity, and therefore the higher the feature similarity is, the poorer the ability of the neural network to distinguish the reference image and the interference image is, the higher the loss value is, and therefore the ability of the neural network to identify the reference image and the interference image can be trained through the method.

In a possible embodiment, the method further comprises obtaining the interference image according to the following method:

acquiring a first image and a second image which carry different identity marks;

identifying a first target region of the first image and a second target region of the second image, respectively;

and fusing the first image and the second image based on the first image, the first target area and the second target area to obtain an interference image corresponding to the first image.

In a possible embodiment, the method further comprises:

acquiring a third image with the same identity as the first image;

taking the third image as a reference image corresponding to the first image;

and the interference image corresponding to the first image and the reference image corresponding to the first image form a sample image group.

Therefore, the interference image and the reference image in the sample image group correspond to different users, but the users in the interference image are integrated with part of characteristics of the users in the reference image, so that the identification capability of the neural network can be improved and the accuracy of characteristic extraction can be improved when the constructed sample image group trains the neural network.

In one possible embodiment, the first target region and the second target region comprise T-shaped regions consisting of eyes and a nose.

In a possible implementation manner, the fusing the first image and the second image based on the first image, the first target region, and the second target region to obtain an interference image corresponding to the first image includes:

replacing the first target area in the first image by using a second target area in a second image to obtain the interference image; alternatively, the first and second electrodes may be,

and superposing the image corresponding to the second target area on the first target area in the first image to obtain the interference image.

Therefore, the interference images are obtained through different fusion modes, the efficiency of forming the interference images and the diversity of the interference images can be improved, and the identification accuracy of the neural network under different attack scenes is improved.

In a possible implementation manner, before the fusing the first image and the second image based on the first image, the first target region, and the second target region to obtain an interference image corresponding to the first image, the method further includes:

determining size information of the first target area and the second target area respectively;

when the size information of the first target area is different from the size information of the second target area, performing scaling processing on the image in the second target area based on the size information of the first target area;

the fusing the first image and the second image based on the first image, the first target area, and the second target area to obtain an interference image corresponding to the first image includes:

and fusing the first image and the second image based on the first image, the first target area and the second target area after scaling processing to obtain an interference image corresponding to the first image.

Therefore, by carrying out scaling processing, the obtained interference image has stronger interference, and correspondingly, when the neural network is trained, the trained neural network has higher network precision.

In a second aspect, an embodiment of the present disclosure further provides a neural network training apparatus, including:

the acquisition module is used for acquiring a plurality of groups of sample images; each group of sample images comprises a reference image corresponding to a first object and an interference image corresponding to a second object, wherein the interference image fuses partial characteristics of the first object in the reference image;

the first determining module is used for respectively inputting each sample image in the multiple groups of sample images into a neural network to be trained and determining a feature vector corresponding to each sample image;

the second determination module is used for determining the feature similarity between the reference image and the interference image in the same group of sample images based on the feature vector;

and the training module is used for determining a loss value of the training based on the characteristic similarity corresponding to each group of sample images and training the neural network to be trained based on the loss value.

In one possible embodiment, when determining the loss value of the current training based on the feature similarity corresponding to each group of sample images, the training module is configured to:

In a possible implementation, the obtaining module is further configured to obtain the interference image according to the following method:

In a possible implementation manner, the obtaining module is further configured to:

acquiring a third image with the same identity as the first image;

taking the third image as a reference image corresponding to the first image;

In a possible implementation manner, when the obtaining module fuses the first image and the second image based on the first image, the first target region, and the second target region to obtain an interference image corresponding to the first image, the obtaining module is configured to:

In a possible implementation manner, before the obtaining module fuses the first image and the second image based on the first image, the first target region, and the second target region to obtain an interference image corresponding to the first image, the obtaining module is further configured to:

the obtaining module, when fusing the first image and the second image based on the first image, the first target region, and the second target region to obtain an interference image corresponding to the first image, is configured to:

In a third aspect, an embodiment of the present disclosure further provides a computer device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the computer device is running, the machine-readable instructions when executed by the processor performing the steps of the first aspect described above, or any possible implementation of the first aspect.

In a fourth aspect, this disclosed embodiment also provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.

For the description of the effects of the neural network training device, the computer device, and the computer-readable storage medium, reference is made to the description of the neural network training method, and details are not repeated here.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 illustrates a flow chart of a neural network training method provided by an embodiment of the present disclosure;

fig. 2 is a flowchart illustrating a specific method for obtaining an interference image in a sample image group in a neural network training method provided by an embodiment of the present disclosure;

fig. 3a is a schematic diagram illustrating a T-shaped region in a neural network training method provided by an embodiment of the present disclosure;

fig. 3b is a schematic diagram illustrating size information of a first target area in a neural network training method provided by an embodiment of the present disclosure;

fig. 3c is a schematic diagram illustrating an interference image after replacing the first target region in the neural network training method provided by the embodiment of the disclosure;

fig. 4 is a flowchart illustrating a specific method for scaling an image in a second target region in the neural network training method provided by the embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating an architecture of a neural network training device provided in an embodiment of the present disclosure;

fig. 6 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Research shows that with the progress of technology, an attack method called as interference glasses attack appears, and the specific mode is as follows: the face photo of the user A is printed on paper, the T-shaped area of the eyes and the nose is cut out, the T-shaped area is pasted on the glasses to form interference glasses, and the interference glasses account for a small proportion in the face of a person, so that the user B can pass through living body detection with probability after wearing the interference glasses, and the safety risk of the existing face recognition system is increased.

Based on the research, the disclosure provides a neural network training method, a device, computer equipment and a storage medium, wherein images fused with partial features of other users are used as interference images, and corresponding reference images are used as sample images to train the neural network, so that a sample used in training the neural network can better simulate face images which may appear in an interference glasses attack scene; in the process of training the neural network, the loss value of the neural network is determined based on the feature similarity between the reference image and the interference image, and the higher the feature similarity between the reference image and the interference image is, the larger the loss value of the neural network is, so that the resolving power of the neural network for the reference image and the interference image can be trained, and the identification accuracy of the neural network in the scene of the interference glasses is enhanced.

To facilitate understanding of the present embodiment, first, a neural network training method disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the neural network training method provided in the embodiments of the present disclosure is generally a computer device with certain computing power, and the computer device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, or a server or other processing device. In some possible implementations, the neural network training method may be implemented by a processor invoking computer readable instructions stored in a memory.

Referring to fig. 1, a flowchart of a neural network training method provided in the embodiment of the present disclosure is shown, where the method includes steps S101 to S104, where:

s101: acquiring a plurality of groups of sample images; each group of sample images comprises a reference image corresponding to a first object and an interference image corresponding to a second object, wherein the interference image fuses partial characteristics of the first object in the reference image.

S102: and respectively inputting each sample image in the multiple groups of sample images into a neural network to be trained, and determining a feature vector corresponding to each sample image.

S103: and determining the feature similarity between the reference image and the interference image in the same group of sample images based on the feature vector.

S104: and determining a loss value of the training based on the feature similarity corresponding to each group of sample images, and training the neural network to be trained based on the loss value.

The following is a detailed description of the above steps.

For S101, the sample image is a face image used for training the neural network to be trained, the reference image includes a complete face image of a first object (a face image without fusing partial features of other users), and the interference image is an image fused with partial features of the first object in the reference image.

Here, the partial feature of the first object fused in the interference image may be a feature included in a partial face image of the first object obtained by cropping a reference image, or may also be a feature included in a partial face image obtained by cropping another face image corresponding to the first object in the reference image (the other face image of the first object is not the reference image, but both a user in the image and a user in the reference image are the first object), and details of the image fusion process will be described in detail below, and a description thereof will not be further provided.

For example, taking a scene simulating an interference glasses attack as an example, the reference image is the image a1 of the first object user a, when the interference image is generated, a partial region image may be extracted from the image a2 of the user a, the partial region image is fused on the image a3 of the second object user B to obtain the interference image, and the fused image a3 and the fused image a1 form a group of sample images, that is, a group of sample images.

In practical application, when the image fusion processing is performed by using the partial features of the first object in the reference image and the interference image is generated, the interference image can be automatically generated after receiving an interference image generation instruction; or the interference image can be obtained by processing the sample image manually by a user. The steps in actual execution are similar whether the interference image is automatically generated after responding to the instruction or the interference image is gradually generated in response to the user operation, so that the method for generating the interference image provided by the embodiment of the present disclosure is described in detail below by taking the interference image generated automatically after receiving the interference image generation instruction as an example.

In a possible embodiment, as shown in fig. 2, the interference image may be obtained by:

s201: acquiring a first image and a second image carrying different identification marks.

Here, the different identities may be marks only used for distinguishing different identities, for example, a first image carrying an identity a (that is, a first object) and a second image carrying an identity B (that is, a second object) may be obtained; the number of the first images may be multiple, for example, the identification marks are sequentially a₀、A₁、

A

₂0, 1 and 2 of the three first images respectively represent the serial numbers of the images, and A represents that the identity corresponding to the first image is A.

Exemplarily, if the identification marks in the two images are the same, the users in the images are the same; if the two images have the same identification but different serial numbers, the two images are different images of the same user, for example, photos taken by the same user at different times/places.

S202: a first target region of the first image and a second target region of the second image are identified, respectively.

Here, the first target region and the second target region include a T-shaped region composed of eyes and a nose.

In specific implementation, because angles and distances when the first image and the second image are shot may not be consistent, so that the angles and sizes of the faces of the first image and the second image may be greatly different, when the T-shaped region is identified, the first/second image may be corrected first, and a first/second image whose face angle and size are similar after the correction processing is obtained; and then, recognizing eye regions and nose regions in the human face through a face recognition model, and forming the T-shaped region according to the regions where the eyes and the nose are located.

For example, a schematic diagram of the T-shaped region may be as shown in fig. 3a, where in fig. 3a, the first image after the calibration process is identified, the region where the eyes are located and the region where the nose is located (the region within the solid line frame) are identified, and the T-shaped region is generated by connecting the region where the eyes are located and the region where the nose is located (the region is connected by the dotted line).

In practical application, because differences such as eye distance differences of different faces exist, even if correction processing is performed in advance, a certain difference (that is, the sizes of the generated T-shaped regions are different) also exists between the first target region and the second target region, and in order to improve the training effect as much as possible, the size of the corresponding image in the second target region may be adjusted, so that the generated interference image can be closer to a real complete image, thereby increasing the training difficulty of the neural network to be trained during training, and correspondingly improving the recognition capability of the neural network to be trained.

In one possible implementation, before fusing the second target region with the first target region, as shown in fig. 4, the image in the second target region may be scaled by:

s401: size information of the first target area and the second target area is determined, respectively.

For example, taking the determination of the size information of the first target area as an example, a schematic diagram of the size information of the first target area may be as shown in fig. 3b, the first length to the sixth length are marked in the figure (for convenience of viewing, the first length to the sixth length are respectively indicated by numerals 1 to 6), and the size information of the first target area can be accurately determined according to the lengths.

S402: and when the size information of the first target area is different from the size information of the second target area, performing scaling processing on the image in the second target area based on the size information of the first target area.

Here, the existence difference may be that at least one of the first to sixth lengths is different (or a difference value is greater than a preset value/ratio).

Specifically, when the image in the second target region is scaled based on the size information of the first target region, the image in the second target region may be scaled according to the first length to the sixth length in the size information of the first target region, so that the image in the second target region (second target region image) after the scaling process is the same as the size information of the first target region.

Illustratively, the third length (i.e., the width of the two eyes) in the size information of the first target area is 35mm, the third length in the size information of the second target area is 28mm, and as can be seen from 35 ÷ 28 ═ 1.25, the area between the two eyes (i.e., the area composed of the third length and the second length in fig. 3 b) needs to be enlarged by 25% in the lateral direction, so as to complete the scaling process for the third length.

S203: and fusing the first image and the second image based on the first image, the first target area and the second target area to obtain an interference image corresponding to the first image.

Here, in the fusion, any one of the following methods may be used:

and a method A, replacing the first target area in the first image by using the second target area.

Here, a schematic diagram of the interference image after replacing the first target region may be as shown in fig. 3c, where in fig. 3c, the difference between the original first image and the face image in the second image is large (there is a difference between the face and the like), the first target region is replaced by using the second target region after the scaling processing, the generated interference image includes a partial region image of the first image and the second image, and the eyes and the nose in the interference image are better fused with the original first image.

Further, after the replacement, processing algorithms such as a poisson fusion algorithm and the like can be used for processing the image after the replacement, so that the generated interference image can be closer to a real complete image, the training difficulty of the neural network to be trained during training can be increased, and the recognition capability of the neural network to be trained is correspondingly improved.

For example, taking the poisson fusion algorithm as an example, after the replacement is performed, the poisson fusion equation may be used to perform edge fusion on the replaced edge, so that the image after the replacement does not change significantly at the replaced edge, and is closer to a real complete image.

Correspondingly, after the first image and the second image are fused by the fusion method described in the method a, the interference image corresponding to the first image can be obtained.

And B, superposing the image corresponding to the second target area on the first target area in the first image.

For example, the display effect map after the overlay display may be as shown in fig. 3c, and in fig. 3c, the image corresponding to the second target area is displayed in the first target area in an overlay manner.

Correspondingly, after the first image and the second image are fused by the fusion method described in method B, the interference image corresponding to the first image can be obtained.

And C, fusing the image corresponding to the first target area and the image corresponding to the second target area.

Here, during the fusion processing, an image fusion algorithm may be used for fusion to obtain a first fused image after the fusion processing, where the image fusion algorithm may be at least one of a color transformation fusion algorithm, a ratio fusion algorithm, and the like;

or, a preset layer processing method may be used to process the image corresponding to the first target region and the image corresponding to the second target region, so as to obtain a processed second fused image.

For example, in the image processing method, layers in which a first target region and a second target region are respectively located may be processed, and the transparencies of the image corresponding to the first target region and the layer corresponding to the second target region are adjusted, so that the obtained second fused image simultaneously includes the image corresponding to the first target region and the image corresponding to the second target region.

Correspondingly, after the first image and the second image are fused by the fusion method described in method C, a first fused image (or a second fused image) can be obtained, and at this time, the first fused image (or the second fused image) can be displayed in the first target area, so as to obtain an interference image corresponding to the first image; or, when there are a plurality of first images, the first fused image (or the second fused image) may be displayed on the first target region of another image, so as to obtain an interference image corresponding to the first image.

Illustratively, the first image is taken as A₀、A₁、A₂The second image is B₀For example, for A₀And B₀The second target region of (a) is fused to obtain a fused image, and then the fused image can be superimposed on the image A₁Or A₂Or the first target region of (a) using the fused image pair₁Or A₂The image of the first target area is replaced, so that an interference image corresponding to the first image is obtained.

Therefore, through the multiple fusion modes, even under the condition that the training data are limited, multiple interference images which can be used for training the neural network can be generated, so that the utilization rate of the training data is improved, meanwhile, the trained neural network can adapt to the interference of different interference images, and the identification accuracy of the neural network is improved.

Further, when a group of sample images is constructed, the third image with the same identity as the first image may be obtained, and the third image is used as a reference image corresponding to the first image; and then determining the interference image corresponding to the first image and the reference image corresponding to the first image as the same group of sample images.

The third image (i.e., the reference image corresponding to the first image) may be the same image as the first image, so that only two images are needed to generate the sample image group, the construction threshold of the sample image group is reduced, and the quantity and diversity of training data are improved; alternatively, the third image may be a different image from the first image, that is, a different image taken for the same user.

Further, the third image may be subjected to image processing to obtain the reference image, for example, the third image may be processed by using a face-thinning special effect in beauty to simulate a recognition scene of a user after losing weight, so as to improve recognition accuracy of a neural network in different recognition scenes.

In addition, there may be a plurality of interference images corresponding to the first image, such as a plurality of interference images generated according to the interference image generation method.

In practical application, the interference images in the sample image group are fused with partial characteristics of other users, so that interference scenes such as interference glasses and the like can be well simulated by using the interference images, and after the neural network is trained by using the interference images, the identification accuracy of the neural network under the interference condition can be improved.

Here, in the feature vectors corresponding to each of the images, the feature vector corresponding to the reference image is a first feature vector, and the feature vector corresponding to the interference image is a second feature vector.

Specifically, after each sample image in the multiple sets of sample images is input into the neural network to be trained, the neural network processes the input sample image to generate a first feature vector corresponding to the reference image in the sample image and a second feature vector corresponding to the interference image.

It should be noted that the neural network to be trained in the embodiment of the present disclosure may be a neural network that is pre-trained (for example, trained by using a labeled sample), and has a certain face feature extraction capability, and the purpose of training by using the neural network training method provided in the embodiment of the present disclosure is to perform targeted enhancement training for attack modes such as glasses attack; or, the neural network to be trained may also be an untrained neural network, and compared with the neural network that has been pre-trained, the number of sample images required by the untrained neural network may be correspondingly greater, and the number of training times may also be set to be greater.

It should be noted that, a training process of the neural network, that is, a process of adjusting network parameters in the neural network, is aimed at distinguishing an interference image from a reference image through the trained neural network, that is, improving the accuracy of feature vectors extracted by the neural network.

Here, the feature similarity may be represented by a feature distance such as a euclidean distance or a mahalanobis distance, and the feature similarity is inversely proportional to the feature distance, that is, the greater the feature distance, the lower the feature similarity.

For example, taking the feature similarity as a euclidean distance as an example, when a plurality of interference images are included in a group of the sample images, the euclidean distance dis between each second feature vector and the first feature vector may be calculated based on the second feature vector corresponding to each interference image and the first feature vector corresponding to the reference image.

Here, when determining the loss value of the current training, a preset loss function and the feature similarity may be used for calculation, where the loss function includes a preset loss parameter, and the preset loss parameter is used to indicate a capability of the neural network to distinguish the reference image from the interference image.

And under the condition that the feature similarity is larger than a preset loss parameter, the loss value is in direct proportion to the feature similarity.

It should be noted that, when the neural network distinguishes the reference image and the interference image, since the users corresponding to the reference image and the interference image are different (the user corresponding to the interference image should be the user in the image before feature fusion), the higher the calculated feature similarity is, the lower the capability of the neural network in distinguishing the reference image and the interference image is, the larger the corresponding loss value is, that is, the loss value is positively correlated with the feature similarity.

Further, when the feature similarity is small to a certain degree, the representation indicates that the neural network can well distinguish the reference image from the interference image at this time, and thus adjustment of network parameters is not required, so that a loss parameter needs to be set correspondingly as a condition (threshold) for judging whether network parameter adjustment needs to be performed, that is, only under the condition that the feature similarity is greater than a preset loss parameter, calculation of a loss value is performed.

In a possible implementation manner, when determining the loss value of the current training, a target loss function corresponding to the feature similarity may be determined based on a numerical relationship between the feature similarity and a preset loss parameter, and the loss value of the current training may be obtained based on the target loss function.

Here, still taking the feature similarity expressed by using the euclidean distance as an example, the loss function may be:

Loss＝(dis-margin)² dis>margin

Loss＝0 dis≤margin

wherein, Loss represents Loss value, dis represents Euclidean distance between the second characteristic vector and the first characteristic vector, margin represents preset Loss parameter, the value range can be any value between 0 and 2, when dis is larger than margin, the formula when Loss value is calculated is the square of difference between dis and margin (dis-margin)²(ii) a When dis is less than or equal to margin, the loss value is then 0.

It should be noted that, since the euclidean distance is used to represent the feature similarity, and the euclidean distance is inversely proportional to the feature similarity, when designing the loss function, the corresponding adjustment of "in proportion to the feature similarity if the feature similarity is greater than a preset loss parameter" is performed so that "in inverse proportion to the feature similarity if the feature similarity is less than the preset loss parameter" and, if characterizing the feature similarity using another parameter, the loss function may be adjusted according to the relationship (in proportion or inverse proportion) between the parameter and the feature similarity.

Specifically, when determining the target loss function corresponding to the feature similarity based on the numerical relationship between the feature similarity and the preset loss parameter, it may be determined that the target loss function is (dis-margin) when the euclidean distance is greater than the preset loss parameter based on the euclidean distance between the first feature vector and the second feature vector and the numerical relationship between the preset loss parameter²(ii) a And under the condition that the Euclidean distance is greater than the preset loss parameter, the target loss function is 0.

In addition, the value of the preset loss parameter can be set and adjusted according to the application scene of the neural network, and if the current scene has a high requirement on the recognition accuracy, a larger margin is used, for example, the value is set to 1.5; and if the current scene has lower requirement on the recognition accuracy, otherwise, using a smaller margin, for example, setting to 0.9.

Further, after the loss value of the training is determined, the network parameters of the neural network to be trained can be adjusted based on the loss value of the training, and training is continued by using the sample image after adjustment until the training result meets the preset condition.

The training result meeting the preset condition may be that the training time reaches a preset maximum iteration time, for example, the training time reaches a preset maximum training time 20; and/or the precision of the network after training reaches a preset precision requirement, for example, the loss value is 0 for 5 times in succession.

According to the neural network training method provided by the embodiment of the disclosure, the images fused with partial characteristics of other users are used as interference images, and the neural network is trained by combining corresponding reference images as sample images, so that a sample used in training the neural network can better simulate face images which may appear in an interference glasses attack scene; in the process of training the neural network, the loss value of the neural network is determined based on the feature similarity between the reference image and the interference image, and the higher the feature similarity between the reference image and the interference image is, the larger the loss value of the neural network is, so that the resolving power of the neural network for the reference image and the interference image can be trained, and the identification accuracy of the neural network in the scene of the interference glasses is enhanced.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, the embodiment of the present disclosure further provides a neural network training device corresponding to the neural network training method, and as the principle of solving the problem of the device in the embodiment of the present disclosure is similar to that of the neural network training method in the embodiment of the present disclosure, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.

Referring to fig. 5, there is shown a schematic architecture diagram of a neural network training device according to an embodiment of the present disclosure, where the neural network training device includes: an acquisition module 501, a first determination module 502, a second determination module 503, and a training module 504; wherein the content of the first and second substances,

an obtaining module 501, configured to obtain multiple sets of sample images; each group of sample images comprises a reference image corresponding to a first object and an interference image corresponding to a second object, wherein the interference image fuses partial characteristics of the first object in the reference image;

a first determining module 502, configured to input each sample image in the multiple sets of sample images into a neural network to be trained, and determine a feature vector corresponding to each sample image;

a second determining module 503, configured to determine feature similarity between the reference image and the interference image in the same group of sample images based on the feature vector;

the training module 504 is configured to determine a loss value of the current training based on the feature similarity corresponding to each group of sample images, and train the neural network to be trained based on the loss value.

In one possible implementation manner, when determining the loss value of the current training based on the feature similarity corresponding to each group of sample images, the training module 504 is configured to:

In a possible implementation, the obtaining module 501 is further configured to obtain the interference image according to the following method:

In a possible implementation manner, the obtaining module 501 is further configured to:

acquiring a third image with the same identity as the first image;

taking the third image as a reference image corresponding to the first image;

In a possible implementation manner, the obtaining module 501, when fusing the first image and the second image based on the first image, the first target region, and the second target region to obtain an interference image corresponding to the first image, is configured to:

In a possible implementation manner, before the obtaining module 501, based on the first image, the first target region, and the second target region, fuses the first image and the second image to obtain an interference image corresponding to the first image, the obtaining module is further configured to:

the obtaining module 501, when fusing the first image and the second image based on the first image, the first target region, and the second target region to obtain an interference image corresponding to the first image, is configured to:

According to the neural network training device provided by the embodiment of the disclosure, the images fused with partial characteristics of other users are used as interference images, and the neural network is trained by combining corresponding reference images as sample images, so that a sample used in training the neural network can better simulate face images which may appear in an interference glasses attack scene; in the process of training the neural network, the loss value of the neural network is determined based on the feature similarity between the reference image and the interference image, and the higher the feature similarity between the reference image and the interference image is, the larger the loss value of the neural network is, so that the resolving power of the neural network for the reference image and the interference image can be trained, and the identification accuracy of the neural network in the scene of the interference glasses is enhanced.

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

Based on the same technical concept, the embodiment of the disclosure also provides computer equipment. Referring to fig. 6, a schematic structural diagram of a computer device 600 provided in the embodiment of the present disclosure includes a processor 601, a memory 602, and a bus 603. The memory 602 is used for storing execution instructions and includes a memory 6021 and an external memory 6022; the memory 6021 is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 601 and the data exchanged with the external memory 6022 such as a hard disk, the processor 601 exchanges data with the external memory 6022 through the memory 6021, and when the computer device 600 operates, the processor 601 communicates with the memory 602 through the bus 603, so that the processor 601 executes the following instructions:

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the neural network training method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the neural network training method in the foregoing method embodiments, which may be referred to specifically in the foregoing method embodiments, and are not described herein again.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implementing, and for example, a plurality of units or components may be combined, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A neural network training method, comprising:

2. The method according to claim 1, wherein the determining the loss value of the training based on the feature similarity corresponding to each group of sample images comprises:

3. The method according to claim 2, wherein the loss value is proportional to the feature similarity if the feature similarity is greater than the preset loss parameter.

4. A method according to any one of claims 1 to 3, further comprising obtaining the interference image according to the following method:

5. The method of claim 4, further comprising:

acquiring a third image with the same identity as the first image;

taking the third image as a reference image corresponding to the first image;

6. The method of claim 4, wherein the first target region and the second target region comprise a T-shaped region consisting of eyes and a nose.

7. The method according to any one of claims 4 to 6, wherein the obtaining of the interference image corresponding to the first image by fusing the first image and the second image based on the first image, the first target region, and the second target region comprises:

8. The method according to any one of claims 4 to 7, wherein before the first image and the second image are fused based on the first image, the first target region, and the second target region to obtain an interference image corresponding to the first image, the method further comprises:

9. A neural network training device, comprising:

10. A computer device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when a computer device is run, the machine-readable instructions when executed by the processor performing the steps of the neural network training method of any one of claims 1 to 8.

11. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the neural network training method according to any one of claims 1 to 8.