CN113902959A - Image recognition method and device, computer equipment and storage medium - Google Patents

Image recognition method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN113902959A
CN113902959A CN202111261097.8A CN202111261097A CN113902959A CN 113902959 A CN113902959 A CN 113902959A CN 202111261097 A CN202111261097 A CN 202111261097A CN 113902959 A CN113902959 A CN 113902959A
Authority
CN
China
Prior art keywords
sample image
semantic segmentation
segmentation network
network model
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111261097.8A
Other languages
Chinese (zh)
Inventor
裴博润
陈永录
仇国龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202111261097.8A priority Critical patent/CN113902959A/en
Publication of CN113902959A publication Critical patent/CN113902959A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to an image identification method, an image identification device, computer equipment and a storage medium, and belongs to the technical field of image identification. The method comprises the following steps: adding disturbance in the original sample image to obtain an initial confrontation sample image; inputting the initial countermeasure sample image into a semantic segmentation network model, and attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity to obtain a countermeasure sample image; carrying out optimization training on the semantic segmentation network model by utilizing a training sample set to obtain a semantic segmentation network optimization model, wherein the training sample set is a set added with countermeasure sample images according to a specified proportion; and identifying the image to be identified by utilizing a semantic segmentation network optimization model. The semantic segmentation network optimization model obtained by the method has strong anti-jamming capability and good recognition effect.

Description

Image recognition method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of image recognition technologies, and in particular, to an image recognition method, an image recognition apparatus, a computer device, and a storage medium.
Background
With the development of deep learning technology, the artificial intelligence technology based on deep learning continuously generates deep influence in various fields of human society, and image recognition is just one of the fields.
In the prior art, pictures are generally recognized through a trained semantic segmentation network model.
However, the study found that: even if changes brought by the disturbances to a picture are difficult to perceive by human eyes, the trained semantic segmentation network model in the prior art can generate wrong identification, for example, for an image of a cat, only the tiny disturbances are added, so that the trained semantic segmentation network model in the prior art can identify the image of the cat as a dog or other animals, that is, the identification accuracy of the trained semantic segmentation network model in the prior art is not high in the face of the tiny disturbances.
Disclosure of Invention
In view of the above, it is necessary to provide an image recognition method, an apparatus, a computer device and a storage medium for solving the above technical problems.
In a first aspect, an image recognition method is provided, which includes:
adding disturbance in the original sample image to obtain an initial confrontation sample image; inputting the initial countermeasure sample image into a semantic segmentation network model, and attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity to obtain a countermeasure sample image; carrying out optimization training on the semantic segmentation network model by utilizing a training sample set to obtain a semantic segmentation network optimization model, wherein the training sample set is a set added with countermeasure sample images according to a specified proportion; and identifying the image to be identified by utilizing a semantic segmentation network optimization model.
In one embodiment, the objective of the loss function in the semantic segmentation network model is to maximize an included angle between a first vector and a second vector output by each layer of the semantic segmentation network model, wherein the first vector is a vector corresponding to the original sample image output by each layer of the semantic segmentation network model, and the second vector is a vector corresponding to the initial confrontation sample image output by each layer of the semantic segmentation network model.
In one embodiment, the loss function in the semantic segmentation network model is:
Figure BDA0003325536460000021
Figure BDA0003325536460000022
Such than||δ||<ξ
wherein K is the range of the attacked layer number, theta is the included angle between the maximized original output and the confrontation output of each layer, xi is infinitesimal quantity, l is the minimum quantityi(x) For semantically segmenting the output of the i-th layer of the network model,/i(x + δ) is the output of the ith layer of the semantic segmentation network model with perturbation added to the data matrix of the original sample image, | | li(x+δ)||2For the second paradigm of a matrix formed after adding perturbations to the original sample image data matrix, | | li(x)||2In a second paradigm for the original sample image data matrix, δ is the countermeasure disturbance.
In one embodiment, the inputting the initial countermeasure sample image into a semantic segmentation network model and attacking the semantic segmentation network model using a cosine similarity-based feature spoofing algorithm to obtain a countermeasure sample image includes: inputting the initial countermeasure sample image into the semantic segmentation network model, attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity, calculating the saturation of the output characteristic of each convolution layer in the semantic segmentation network model and the saturation difference of the output characteristics of two adjacent convolution layers, and if the saturation corresponding to a specified convolution layer is greater than a first preset threshold and the saturation difference of the output characteristic of the specified convolution layer and the output characteristic of the adjacent convolution layer is less than a second preset threshold, storing the candidate countermeasure sample image output by the specified convolution layer; compressing the disturbance in the candidate countermeasure sample image according to a preset proportion, inputting the compressed candidate countermeasure sample image into the semantic segmentation network model, attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity, performing iterative computation until the iteration times reach a specified number, and storing the candidate countermeasure sample image corresponding to each iterative computation; one of the candidate confrontation sample images is selected as the confrontation sample image.
In one embodiment, the selecting one of the candidate confrontation sample images as the confrontation sample image includes: and sequentially adding the disturbance in each candidate confrontation sample image into the test sample image, sequentially inputting the test sample image added with different disturbances into the semantic segmentation network model, calculating the cheating rate of the disturbance in each candidate confrontation sample image, and selecting the candidate confrontation sample image corresponding to the disturbance with the highest cheating rate as the confrontation sample image.
In one embodiment, the selecting one of the candidate confrontation sample images as the confrontation sample image further includes: after the candidate countermeasure sample image is obtained through each iterative calculation, adding the disturbance in the obtained candidate countermeasure sample image into the test sample image, inputting the test sample image added with the disturbance into the semantic segmentation network model, and calculating the deception rate of the disturbance in the candidate countermeasure sample image; and if the cheating rate of the disturbance of the candidate countermeasure sample image obtained by the iterative calculation for the continuously specified times is smaller than the cheating rate of the disturbance of the candidate countermeasure sample image obtained by the previous iterative calculation, stopping the iterative calculation, and selecting the candidate countermeasure sample image corresponding to the disturbance with the highest cheating rate as the countermeasure sample image.
In one embodiment, the calculating the fraud rate of the candidate against the disturbance in the sample image comprises: the evaluation index of semantic segmentation is used to calculate the fraud rate of the candidate against the disturbance in the sample image.
In a second aspect, there is provided an image recognition apparatus, comprising: the first acquisition module is used for adding disturbance in the original sample image to obtain an initial confrontation sample image; the second acquisition module is used for inputting the initial confrontation sample image into a semantic segmentation network model and attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity so as to obtain a confrontation sample image; the third acquisition module is used for carrying out optimization training on the semantic segmentation network model by utilizing the training sample set to obtain the semantic segmentation network optimization model, wherein the training sample set is a set added with confrontation sample images according to a specified proportion; and the recognition module is used for recognizing the image to be recognized by utilizing the semantic segmentation network optimization model.
In one embodiment, the objective of the loss function in the semantic segmentation network model is to maximize an angle between a first vector and a second vector output by each layer of the semantic segmentation network model, where the first vector is a vector output by each layer of the semantic segmentation network model corresponding to the original sample image, and the second vector is a vector output by each layer of the semantic segmentation network model corresponding to the initial confrontation sample image.
In one embodiment, the loss function in the semantic segmentation network model in the second obtaining module is:
Figure BDA0003325536460000031
Figure BDA0003325536460000032
Such than||δ||<ξ
wherein K is the range of the attacked layer number, theta is the included angle between the maximized original output and the confrontation output of each layer, xi is infinitesimal quantity, l is the minimum quantityi(x) For semantically segmenting the output of the i-th layer of the network model,/i(x + δ) is the output of the ith layer of the semantic segmentation network model with perturbation added to the data matrix of the original sample image, | | li(x+δ)||2For adding disturbance to original sample image data matrixSecond paradigm of matrix formation, | li(x)||2In a second paradigm for the original sample image data matrix, δ is the countermeasure disturbance.
In one embodiment, the second obtaining module is specifically configured to: inputting the initial countermeasure sample image into the semantic segmentation network model, attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity, calculating the saturation of the output characteristic of each convolution layer in the semantic segmentation network model and the saturation difference of the output characteristics of two adjacent convolution layers, and if the saturation corresponding to a specified convolution layer is greater than a first preset threshold and the saturation difference of the output characteristic of the specified convolution layer and the output characteristic of the adjacent convolution layer is less than a second preset threshold, storing the candidate countermeasure sample image output by the specified convolution layer; compressing the disturbance in the candidate countermeasure sample image according to a preset proportion, inputting the compressed candidate countermeasure sample image into the semantic segmentation network model, attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity, performing iterative computation until the iteration times reach a specified number, and storing the candidate countermeasure sample image corresponding to each iterative computation; one of the candidate confrontation sample images is selected as the confrontation sample image.
In one embodiment, the second obtaining module is specifically configured to: and sequentially adding the disturbance in each candidate confrontation sample image into the test sample image, sequentially inputting the test sample image added with different disturbances into the semantic segmentation network model, calculating the cheating rate of the disturbance in each candidate confrontation sample image, and selecting the candidate confrontation sample image corresponding to the disturbance with the highest cheating rate as the confrontation sample image.
In one embodiment, the second obtaining module is specifically configured to: after the candidate countermeasure sample image is obtained through each iterative calculation, adding the disturbance in the obtained candidate countermeasure sample image into the test sample image, inputting the test sample image added with the disturbance into the semantic segmentation network model, and calculating the deception rate of the disturbance in the candidate countermeasure sample image; and if the cheating rate of the disturbance of the candidate countermeasure sample image obtained by the iterative calculation for the continuously specified times is smaller than the cheating rate of the disturbance of the candidate countermeasure sample image obtained by the previous iterative calculation, stopping the iterative calculation, and selecting the candidate countermeasure sample image corresponding to the disturbance with the highest cheating rate as the countermeasure sample image.
In one embodiment, the second obtaining module is specifically configured to: the evaluation index of semantic segmentation is used to calculate the fraud rate of the candidate against the disturbance in the sample image.
In a third aspect, there is provided a computer device comprising a memory storing a computer program and a processor implementing the steps of the method of any of the first aspect when the processor executes the computer program.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of implementing the method of any of the first aspects described above.
The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:
in the embodiment of the application, firstly, adding disturbance in an original sample image to obtain an initial confrontation sample image; secondly, inputting the initial countermeasure sample image into a semantic segmentation network model, and attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity so as to obtain a countermeasure sample image; thirdly, performing optimization training on the semantic segmentation network model by using a training sample set added with a countermeasure sample image according to a specified proportion to obtain a semantic segmentation network optimization model; and finally, identifying the image to be identified by utilizing the semantic segmentation network optimization model. Because the characteristic deception algorithm based on the cosine similarity has higher deception rate, the anti-interference sample image generated by the algorithm is added into the training sample set, and the semantic segmentation network optimization model trained based on the training sample set has stronger anti-interference capability and better recognition effect.
Drawings
FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application;
fig. 2 is a flowchart of an image recognition method according to an embodiment of the present application;
FIG. 3 is a flowchart of a technical means for obtaining an image of a challenge sample according to an embodiment of the present application;
FIG. 4 is a flowchart of another image recognition method provided in the embodiments of the present application;
fig. 5 is a block diagram of an image recognition apparatus according to an embodiment of the present application;
fig. 6 is an internal structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
With the development of deep learning technology, the artificial intelligence technology based on deep learning continuously generates deep influence in various fields of human society, and image recognition is just one of the fields.
In the prior art, pictures are generally recognized through a trained semantic segmentation network model.
However, the study found that: even if changes brought by the disturbances to a picture are difficult to perceive by human eyes, the trained semantic segmentation network model in the prior art can generate wrong identification, for example, for an image of a cat, only the tiny disturbances are added, so that the trained semantic segmentation network model in the prior art can identify the image of the cat as a dog or other animals, that is, the image identification accuracy of the trained semantic segmentation network model in the prior art is not high when the image identification is faced with the tiny disturbances.
In view of this, embodiments of the present application provide an image recognition method, an image recognition apparatus, a computer device, and a storage medium, which improve the accuracy of image recognition of a semantic segmentation network model by optimizing the semantic segmentation network model in the prior art.
Please refer to fig. 1, which illustrates a schematic diagram of an implementation environment related to an image recognition method provided by an embodiment of the present application. As shown in fig. 1, an execution subject of the image recognition method provided in the embodiment of the present application may be one computer device, or may be a computer device cluster composed of a plurality of computer devices. Different computer devices can communicate with each other in a wired or wireless manner, and the wireless manner can be realized through WIFI, an operator network, NFC (near field communication) or other technologies.
Referring to fig. 2, a flowchart of an image recognition method provided by an embodiment of the present application is shown, where the image recognition method can be applied to the computer device shown in fig. 1. As shown in fig. 2, the image recognition method may include the steps of:
step 201, adding disturbance to the original sample image by the computer equipment to obtain an initial confrontation sample image.
In this embodiment of the present application, the computer device may add the disturbance in the original sample image based on various algorithms, where the various algorithms may be selected according to an actual service scenario, and this embodiment of the present application is not particularly limited. The original sample images may be understood as based on the needs of an actual service scene, a certain number of original images are selected as samples in the transaction process of the service scene, the samples may be used for training or testing of a semantic segmentation network model, and optionally, the number, size, and type of the original sample images may be selected based on the actual needs, which is not specifically limited in the embodiment of the present application.
Step 202, inputting the initial countermeasure sample image to a semantic segmentation network model by the computer equipment, and attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity to obtain the countermeasure sample image.
In an alternative embodiment of the application, the output of the semantic segmentation network is a feature vector, each value on the vector representing a probability of the corresponding class, wherein the probabilitiesThe category corresponding to the maximum value represents the final result of recognition. We assume a feature vector output by a semantic segmentation network as x (x)1,x2,...,xn) If there is now a vector y (y) that is equal in length and orthogonal to the feature vector1,y2,...yn) Then the following formula is satisfied:
x·y=0
specifically, x is satisfied1y1+x2y2+...xnynThe all zero vector is not satisfactory since the values on the classification vector cannot all be zero and cannot have negative values. In this case, it is clear that the equal sign is not available, i.e. the above equation becomes:
x·y*>0
wherein y is*Is a feature vector where x is equal in length but not orthogonal, we need to obtain y when the value on the left side of the above equation is minimized*. According to the conclusion of the ordering inequality, for two arrays x, y, the product of the two arrays reaches the maximum if and only if the maximum number in array x matches (multiplies) the maximum number in array y, and the next largest number matches the next largest number until the minimum number matches the minimum number. Conversely, if the matching is done in the reverse order, the resulting product is minimal. For the feature vector output by the semantic segmentation network, namely the class corresponding to the maximum probability in the vector x corresponds to the vector y*The smallest probability in (2), which causes the classification error of the semantic segmentation network.
Based on the above principle, in the embodiment of the present application, the objective of the loss function in the semantic segmentation network model is to maximize an included angle between the first vector and the second vector output by each layer of the semantic segmentation network model. The first vector is a vector corresponding to the original image and output by each layer of the semantic segmentation network model, and the second vector is a vector corresponding to the initial confrontation sample image and output by each layer of the semantic segmentation network model. Optionally, the manner of obtaining the first vector is as follows: inputting an original sample image into a semantic segmentation network model, wherein the output of the semantic segmentation network model is a first vector; the way of obtaining the second vector is: and inputting the initial confrontation sample image into a semantic segmentation network model, wherein the output of the semantic segmentation network model is a second vector. After the first vector and the second vector are obtained, the first vector and the second vector are input into the loss function to obtain a loss value, and the countermeasure disturbance is adjusted according to the loss value until the loss function is converged based on the adjusted countermeasure disturbance.
Based on this, semantic segmentation can be regarded as target recognition at the pixel level, and the loss function of the semantic segmentation network model is designed as follows:
Figure BDA0003325536460000081
Figure BDA0003325536460000082
Such than||δ||<ξ
wherein K is the range of the attacked layer number, theta is the included angle between the maximized original output and the confrontation output of each layer, xi is infinitesimal quantity, l is the minimum quantityi(x) For semantically segmenting the output of the i-th layer of the network model,/i(x + δ) is the output of the ith layer of the semantic segmentation network model with perturbation added to the data matrix of the original sample image, | | li(x+δ)||2For the second paradigm of a matrix formed after adding perturbations to the original sample image data matrix, | | li(x)||2In a second paradigm for the original sample image data matrix, δ is the countermeasure disturbance.
Note that l in the loss functioni(x) It can be understood that the first vector, l, output by each layer of the semantic segmentation network model is referred to abovei(x + δ) can be understood as the second vector output by each layer of the above-mentioned semantic segmentation network model.
The meaning of the loss function of the above formula can be: for the countermeasure disturbance delta, the included angle theta between the original output and the countermeasure output of each layer is maximized so as to minimize the cosine similarity, and the maximization of the direction difference between the disturbance result and the original result is achieved. In order to prevent the obtained loss value from being too small, the embodiment of the present application inhibits the loss value in the late training period from approaching to 0 by subtracting the cosine value from 1, and takes the logarithm of the result to prevent the loss value from being too large, and continuously changes the loss function in the embodiment of the present application based on the feature cheating of the cosine similarity.
Compared with a GD-UAP algorithm in the prior art, the characteristic deception algorithm based on cosine similarity in the embodiment of the application has a higher deception rate, wherein the GD-UAP algorithm is the current international mainstream attack algorithm and comprises three disturbance types: fast feature spoofing, statistical-based feature spoofing, and data-driven-based feature spoofing.
And 203, the computer equipment performs optimization training on the semantic segmentation network model by using the training sample set to obtain the semantic segmentation network optimization model.
The training sample set is a set added with confrontation sample images according to a specified proportion; the training sample set comprises a confrontation sample image and a general sample image; the designated proportion is the proportion which enables the semantic segmentation network optimization model obtained through training to be optimal.
In the embodiment of the present application, when the proportion of the countermeasure sample image in the training sample set changes, the recognition effect and the anti-interference capability of the semantic segmentation network optimization model obtained by training with the training sample set also change, so in order to make the semantic segmentation network optimization model obtained by training with the training sample set have the best recognition effect and the anti-interference capability, it is necessary to determine the optimal proportion of the countermeasure sample image in the training sample set, where the optimal proportion is the designated proportion. Alternatively, the specified ratio may be an optimal ratio value obtained according to a large amount of experimental data, or may be an optimal ratio range. Whether the optimal proportion value or the optimal proportion range is obtained, the semantic segmentation network optimization model obtained by training the training sample set containing the confrontation sample image with the specified proportion needs to be ensured to be optimal, namely, the semantic segmentation network optimization model has the optimal recognition effect and the anti-interference capability.
In the prior art, a GD-UAP algorithm is usually used to attack a semantic segmentation network model to obtain a countermeasure sample image, and a training sample set added with the countermeasure sample image according to a specified proportion is used to perform optimization training on the semantic segmentation network model, so as to obtain a semantic segmentation network optimization model. In the embodiment of the application, a feature spoofing algorithm based on cosine similarity attacks the semantic segmentation network model to obtain a countermeasure sample image, and the training sample set added with the countermeasure sample image according to a specified proportion is used for carrying out optimization training on the semantic segmentation network model, so that the semantic segmentation network optimization model is obtained. Compared with a GD-UAP algorithm in the prior art, the feature spoofing algorithm based on cosine similarity has a higher spoofing rate, so that the semantic segmentation network trained by the embodiment of the application has better anti-jamming capability and stronger recognition capability.
And step 204, the computer equipment identifies the image to be identified by utilizing the semantic segmentation network optimization model.
In the embodiment of the application, firstly, adding disturbance in an original sample image to obtain an initial confrontation sample image; secondly, inputting the initial countermeasure sample image into a semantic segmentation network model, and attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity so as to obtain a countermeasure sample image; thirdly, optimally training the semantic segmentation network model by utilizing a training sample set added with the countermeasure sample image according to a specified proportion to obtain a semantic segmentation network optimization model; and finally, identifying the image to be identified by utilizing a semantic segmentation network optimization model. Because the characteristic deception algorithm based on the cosine similarity has higher deception rate, the anti-interference sample image generated by the algorithm is added into the training sample set, and the semantic segmentation network optimization model trained based on the training sample set has stronger anti-interference capability and better recognition effect.
Please refer to fig. 3, which illustrates a technical means for obtaining an image of a challenge sample according to an embodiment of the present application. As shown in fig. 3, the technical means for obtaining the confrontation sample image comprises the following steps:
step 301, inputting an initial countermeasure sample image to a semantic segmentation network model by computer equipment, attacking the semantic segmentation network model by using a feature spoofing algorithm based on cosine similarity, calculating saturation of output features of each convolution layer in the semantic segmentation network model and a saturation difference of output features of two adjacent convolution layers, and if the saturation corresponding to a designated convolution layer is greater than a first preset threshold and the saturation difference of the output features of the designated convolution layer and the output features of the adjacent convolution layers is less than a second preset threshold, storing a candidate countermeasure sample image output by the designated convolution layer.
Since the convolutional layer is responsible for extracting features, the attack algorithm in the embodiment of the present application can attack all convolutional layers. The characteristic deception algorithm based on the cosine similarity in the embodiment of the application is a multi-step iteration algorithm, but the characteristic deception algorithm does not have obvious directionality like depfool, and a large amount of iterations are required to continuously approach the optimal solution. In the embodiment of the application, Sat is set in consideration of avoiding the situation of local optimal solutiontThe variables represent the saturation of the perturbed image (the ratio of pixels up to the upper and lower limits of the attack intensity) and SattDifference between two training SatCtIf at SattIn case of too high a SatCtIf the value is too small, the training may be considered to have reached "saturation".
In the embodiment of the application, an initial countermeasure sample image is input into a semantic segmentation network model, a characteristic deception algorithm based on cosine similarity is used for attacking the semantic segmentation network model, namely attacking all convolutional layers in the semantic segmentation network model, and the saturation of the output characteristic of each convolutional layer in the semantic segmentation network model and the saturation difference of the output characteristics of two adjacent convolutional layers are calculated. And if the saturation corresponding to a certain convolutional layer is larger than a first preset threshold value, and the saturation difference value between the convolutional layer and an adjacent convolutional layer is smaller than a second preset threshold value, saving the output candidate confrontation sample image corresponding to the convolutional layer. The first preset threshold and the second preset threshold may be set based on actual needs, may be the same numerical value or different numerical values, and the embodiment of the present application is not particularly limited.
Step 302, compressing the disturbance in the candidate countermeasure sample image according to a preset proportion by the computer equipment, inputting the compressed candidate countermeasure sample image into the semantic segmentation network model, attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity, performing iterative computation until the iteration times reach the specified times, and storing the candidate countermeasure sample image corresponding to each iterative computation.
In the embodiment of the present application, the perturbation in the candidate confrontation sample image is compressed according to a preset ratio, and optionally, if the preset ratio is 0.5, that is, the compression ratio is 0.5, the perturbation may be reduced by half. Inputting the compressed candidate confrontation sample image into a semantic segmentation network model, repeating the steps, calculating the saturation and saturation difference corresponding to each convolution layer of the semantic segmentation network model, obtaining a new round of candidate confrontation sample image corresponding to iterative computation until the iteration times reach the specified times, and storing the candidate confrontation sample image corresponding to each iterative computation.
In step 303, the computer device selects one of the candidate confrontation sample images as a confrontation sample image.
In an optional embodiment of the application, the disturbance in each candidate confrontation sample image is sequentially added to the test sample image, the test sample images with different disturbances added are sequentially input to the semantic segmentation network model, the deception rate of the disturbance in each candidate confrontation sample image is calculated, and the candidate confrontation sample image corresponding to the disturbance with the highest deception rate is selected as the confrontation sample image. The test sample image may be understood as a test sample set for testing the accuracy of the semantic segmentation network model, and the test sample image may be multiple. In the embodiment of the application, the candidate confrontation sample image corresponding to the disturbance with the highest cheating rate is used as the confrontation sample image, the confrontation sample image can cause the maximum interference to the semantic segmentation network model, and a data base is laid for the optimization training of the subsequent semantic segmentation network model.
In the embodiment of the application, the optimal disturbance in the convolution layer in each round is obtained through iterative computation so as to avoid local solution, lay a data foundation for the optimization training of a subsequent semantic segmentation network model, and improve the accuracy and stability of image recognition.
In another optional embodiment of the application, after a candidate countermeasure sample image is obtained through each iterative computation, the disturbance in the obtained candidate countermeasure sample image is added into a test sample image, the test sample image added with the disturbance is input into a semantic segmentation network model, and the deception rate of the disturbance in the candidate countermeasure sample image is computed; and if the cheating rate of the disturbance of the candidate countermeasure sample image obtained by the iterative calculation for the continuously specified times is smaller than the cheating rate of the disturbance of the candidate countermeasure sample image obtained by the previous iterative calculation, stopping the iterative calculation, and selecting the candidate countermeasure sample image corresponding to the disturbance with the highest cheating rate as the countermeasure sample image.
Alternatively, after a candidate confrontation sample image is obtained through each iterative calculation, the disturbance of the candidate confrontation sample image can be tested. For example: adding the disturbance in the obtained candidate countermeasure sample image into the test sample image, inputting the test sample image added with the disturbance into the semantic segmentation network model, and calculating the deception rate of the disturbance in the candidate countermeasure sample image. And if the perturbation of the new round of iterative computation can obtain higher cheating rate on the verification set, considering that a better result is obtained, and storing the candidate countermeasure sample image at the moment to cover the previous result. If the cheating rate of the disturbance of the candidate countermeasure sample image obtained by the iterative computation for the continuously specified times is smaller than the cheating rate of the disturbance of the candidate countermeasure sample image obtained by the previous iterative computation, the iterative computation can be stopped, and the candidate countermeasure sample image which is stored newly is used as the countermeasure sample image or the candidate countermeasure sample image with the highest cheating rate is used as the countermeasure sample image. If the deception rate of the disturbance of the candidate countermeasure sample image obtained by the iterative computation for the continuous designated times is not smaller than the deception rate of the disturbance of the candidate countermeasure sample image obtained by the previous iterative computation in the iterative process, the iterative process is continued until the iterative times reach the designated times.
In this embodiment of the present application, a semantic segmentation evaluation index may be used to calculate a cheating rate of a candidate countermeasure against disturbance in a sample image, where the semantic segmentation evaluation Index (IOU) is a commonly used evaluation index for measuring semantic segmentation accuracy, and may be used to calculate the cheating rate of disturbance, and the IOU is simply an overlap rate of a target window obtained by a model and a marked window, and a formula for calculating the IOU is as follows:
Figure BDA0003325536460000121
wherein TP, FP and FN respectively represent the statistical data of true positive, false positive and false negative, the range of IOU is 0-100%, the higher the value of IOU is, the more accurate the segmentation result is represented, the lower the value of IOU is, the worse the segmentation result is represented, the worse the segmentation result is, the better the perturbation attack effect is represented.
In the embodiment of the application, under the condition that the deception rate of the candidate confrontation sample images obtained continuously for multiple times does not exceed the deception rate of the previous iterative computation, the iterative computation is stopped by the computer equipment in time, the data processing efficiency is improved, and a better result is obtained.
From the above description, compared with the semantic segmentation network in the prior art, the semantic segmentation network optimization model trained in the embodiment of the present application has the following advantages:
1. the semantic segmentation network model trained by the training sample set consisting of the anti-sample image and the common sample image has higher classification precision. As follows:
Model Baseline
FCN-AlexNet 46.75
DL-VGG16 59.20
Adv-FCN-AlexNet 53.40
Adv-DL-VGG16 68.22
the FCN-AlexNet and the DL-VGG16 represent semantic segmentation network models which are not trained by using countermeasure samples, the Adv-FCN-AlexNet and the Adv-DL-VGG16 represent semantic segmentation network models which are trained by using the countermeasure samples, and numerical values in the table are the values of IOU.
As can be seen from the above table, the IOU value of the Adv-FCN-AlexNet semantic segmentation network model is 53.40, the IOU value of the FCN-AlexNe semantic segmentation network model is 46.75, the IOU value of the Adv-DL-VGG16 semantic segmentation network model is 68.22, and the IOU value of the DL-VGG16 semantic segmentation network model is 59.40, so that the IOU value of Adv-FCN-AlexNet is 6.65 higher than the IOU value of the FCN-AlexNe, and the IOU value of Adv-DL-VGG16 is 9.02 higher than the IOU value of DL-VGG16, that is, the classification accuracy of the semantic segmentation network model after the countermeasure sample training is about 5% higher than that of the semantic segmentation network model without the countermeasure sample training.
2. The anti-interference capability of the semantic segmentation network after the cosine similarity-based feature deception training is stronger. As follows:
Figure BDA0003325536460000131
Figure BDA0003325536460000141
the method comprises the steps that No Data is a Data-free fast deception algorithm of GD-UAP, All Data is a characteristic deception attack algorithm based on cosine similarity, white box attack and black box attack are simulated, FCN-AlexNet and DL-VGG16 represent semantic segmentation network models trained by the Data-free fast deception algorithm, Adv-FCN-AlexNet and Adv-DL-VGG16 represent semantic segmentation network models trained by the characteristic deception attack algorithm based on cosine similarity, and numerical values in a table are values of IOUs.
As can be seen from the above table, the IOU value of the semantic segmentation network after the cosine similarity-based feature spoofing training is higher than the IOU value of the semantic segmentation network model after the dataless fast spoofing algorithm training, so that the anti-interference capability of the semantic segmentation network after the cosine similarity-based feature spoofing training is stronger.
Referring to fig. 4, a flowchart of an image recognition method provided by an embodiment of the present application is shown, where the image recognition method can be applied to the computer device shown in fig. 1. As shown in fig. 4, the image recognition method may include the steps of:
step 401, the computer device adds disturbance to the original sample image to obtain an initial confrontation sample image.
In this embodiment of the present application, the computer device may add the disturbance in the original sample image based on various algorithms, where the various algorithms may be selected according to an actual service scenario, and this embodiment of the present application is not particularly limited. The original sample images may be understood as based on the needs of an actual service scene, a certain number of original images are selected as samples in the transaction process of the service scene, the samples may be used for training or testing of a semantic segmentation network model, and optionally, the number, size, and type of the original sample images may be selected based on the actual needs, which is not specifically limited in the embodiment of the present application.
Step 402, inputting the initial countermeasure sample image to a semantic segmentation network model by computer equipment, attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity, calculating the saturation of the output characteristics of each convolution layer in the semantic segmentation network model and the saturation difference of the output characteristics of two adjacent convolution layers, and if the saturation corresponding to a specified convolution layer is greater than a first preset threshold and the saturation difference of the output characteristics of the specified convolution layer and the output characteristics of the adjacent convolution layer is less than a second preset threshold, storing the candidate countermeasure sample image output by the specified convolution layer.
Since the convolutional layer is responsible for extracting features, the attack algorithm in the embodiment of the present application can attack all convolutional layers. The characteristic deception algorithm based on the cosine similarity in the embodiment of the application is a multi-step iteration algorithm, but the characteristic deception algorithm does not have obvious directionality like depfool, and a large amount of iterations are required to continuously approach the optimal solution. In the embodiment of the application, Sat is set in consideration of avoiding the situation of local optimal solutiontThe variables represent the saturation of the perturbed image (the ratio of pixels up to the upper and lower limits of the attack intensity) and SattDifference between two training SatCtIf at SattIn case of too high a SatCtIf the value is too small, the training may be considered to have reached "saturation".
In the embodiment of the application, an initial countermeasure sample image is input into a semantic segmentation network model, a characteristic deception algorithm based on cosine similarity is used for attacking the semantic segmentation network model, namely attacking all convolutional layers in the semantic segmentation network model, and the saturation of the output characteristic of each convolutional layer in the semantic segmentation network model and the saturation difference of the output characteristics of two adjacent convolutional layers are calculated. And if the saturation corresponding to a certain convolutional layer is larger than a first preset threshold value, and the saturation difference value between the convolutional layer and an adjacent convolutional layer is smaller than a second preset threshold value, saving the output candidate confrontation sample image corresponding to the convolutional layer. The first preset threshold and the second preset threshold may be set based on actual needs, may be the same numerical value or different numerical values, and the embodiment of the present application is not particularly limited.
And 403, compressing the disturbance in the candidate countermeasure sample image according to a preset proportion by the computer equipment, inputting the compressed candidate countermeasure sample image into the semantic segmentation network model, attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity, performing iterative computation until the iteration frequency reaches a specified frequency, and storing the candidate countermeasure sample image corresponding to each iterative computation.
In the embodiment of the present application, the perturbation in the candidate confrontation sample image is compressed according to a preset ratio, and optionally, if the preset ratio is 0.5, that is, the compression ratio is 0.5, the perturbation may be reduced by half. Inputting the compressed candidate confrontation sample image into a semantic segmentation network model, repeating the steps, calculating the saturation and saturation difference corresponding to each convolution layer of the semantic segmentation network model, obtaining a new round of candidate confrontation sample image corresponding to iterative computation until the iteration times reach the specified times, and storing the candidate confrontation sample image corresponding to each iterative computation.
Step 404, the computer device sequentially adds the disturbance in each candidate countermeasure sample image to the test sample image, sequentially inputs the test sample image added with different disturbances to the semantic segmentation network model, calculates the deception rate of the disturbance in each candidate countermeasure sample image, and selects the candidate countermeasure sample image corresponding to the disturbance with the highest deception rate as the countermeasure sample image.
The test sample image may be understood as a test sample set for testing the accuracy of the semantic segmentation network model, and the test sample image may be multiple. In the embodiment of the application, the candidate confrontation sample image corresponding to the disturbance with the highest cheating rate is used as the confrontation sample image, the confrontation sample image can cause the maximum interference to the semantic segmentation network model, and a data base is laid for the optimization training of the subsequent semantic segmentation network model.
And 405, performing optimization training on the semantic segmentation network model by using a training sample set to obtain a semantic segmentation network optimization model, wherein the training sample set is a set added with countermeasure sample images according to a specified proportion.
The training sample set is a set added with confrontation sample images according to a specified proportion; the training sample set comprises a confrontation sample image and a general sample image; the designated proportion is the proportion which enables the semantic segmentation network optimization model obtained through training to be optimal.
In the embodiment of the present application, because the recognition effect and the anti-interference capability of the semantic segmentation network optimization model obtained by training with the training sample set are changed when the proportion of the countermeasure sample image in the training sample set is changed, in order to make the semantic segmentation network optimization model obtained by training with the training sample set have the best recognition effect and the anti-interference capability, it is necessary to determine the optimal proportion of the countermeasure sample image in the training sample set, where the optimal proportion is the designated proportion. Alternatively, the specified ratio may be an optimal ratio value obtained according to a large amount of experimental data, or may be an optimal ratio range. Whether the optimal proportion value or the optimal proportion range is obtained, the semantic segmentation network optimization model obtained by training the training sample set containing the confrontation sample image with the specified proportion needs to be ensured to be optimal, namely, the semantic segmentation network optimization model has the optimal recognition effect and the anti-interference capability.
In the prior art, a GD-UAP algorithm is usually used to attack a semantic segmentation network model to obtain a countermeasure sample image, and a training sample set added with the countermeasure sample image according to a specified proportion is used to perform optimization training on the semantic segmentation network model, so as to obtain a semantic segmentation network optimization model. In the embodiment of the application, a feature spoofing algorithm based on cosine similarity attacks the semantic segmentation network model to obtain a countermeasure sample image, and the training sample set added with the countermeasure sample image according to a specified proportion is used for carrying out optimization training on the semantic segmentation network model, so that the semantic segmentation network optimization model is obtained. Compared with a GD-UAP algorithm in the prior art, the feature spoofing algorithm based on cosine similarity has a higher spoofing rate, so that the semantic segmentation network trained by the embodiment of the application has better anti-jamming capability and stronger recognition capability.
And 406, identifying the image to be identified by utilizing a semantic segmentation network optimization model.
In the embodiment of the application, firstly, adding disturbance in an original sample image to obtain an initial confrontation sample image; secondly, inputting the initial countermeasure sample image into a semantic segmentation network model, and attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity so as to obtain a countermeasure sample image; thirdly, optimally training the semantic segmentation network model by utilizing a training sample set added with the countermeasure sample image according to a specified proportion to obtain a semantic segmentation network optimization model; and finally, identifying the image to be identified by utilizing a semantic segmentation network optimization model. Because the characteristic deception algorithm based on the cosine similarity has higher deception rate, the anti-interference sample image generated by the algorithm is added into the training sample set, and the semantic segmentation network optimization model trained based on the training sample set has stronger anti-interference capability and better recognition effect.
It should be understood that, although the steps in the flowchart are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in the figures may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.
Referring to fig. 5, a block diagram of an image recognition apparatus 500 according to an embodiment of the present application is shown, where the apparatus 500 may be configured in the computer device of fig. 1. As shown in fig. 5, the apparatus 500 includes a first obtaining module 501, a second obtaining module 502, a third obtaining module 503, and an identifying module 504.
The first obtaining module 501 is configured to add disturbance to an original sample image to obtain an initial confrontation sample image; a second obtaining module 502, configured to input the initial countermeasure sample image to the semantic segmentation network model, and attack the semantic segmentation network model using a feature spoofing algorithm based on cosine similarity to obtain a countermeasure sample image; a third obtaining module 503, configured to perform optimization training on the semantic segmentation network model by using a training sample set to obtain a semantic segmentation network optimization model, where the training sample set is a set to which countermeasure sample images are added according to a specified ratio; and the identifying module 504 is configured to identify the image to be identified by using the semantic segmentation network optimization model.
In an optional embodiment of the present application, the objective of the loss function in the semantic segmentation network model is to maximize an included angle between a first vector and a second vector output by each layer of the semantic segmentation network model, where the first vector is a vector corresponding to the original sample image output by each layer of the semantic segmentation network model, and the second vector is a vector corresponding to the initial countermeasure sample image output by each layer of the semantic segmentation network model.
In an optional embodiment of the present application, the loss function in the semantic segmentation network model in the second obtaining module 502 is:
Figure BDA0003325536460000181
Figure BDA0003325536460000182
Such than||δ||<ξ
wherein K is the range of the attacked layer number, theta is the included angle between the maximized original output and the confrontation output of each layer, xi is infinitesimal quantity, l is the minimum quantityi(x) For semantically segmenting the output of the i-th layer of the network model,/i(x + δ) is the output of the ith layer of the semantic segmentation network model with perturbation added to the data matrix of the original sample image, | | li(x+δ)||2For the second paradigm of a matrix formed after adding perturbations to the original sample image data matrix, | | li(x)||2In a second paradigm for the original sample image data matrix, δ is the countermeasure disturbance.
In an optional embodiment of the present application, the second obtaining module 502 is specifically configured to: inputting the initial countermeasure sample image into a semantic segmentation network model, attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity, calculating the saturation of the output characteristic of each convolution layer in the semantic segmentation network model and the saturation difference of the output characteristics of two adjacent convolution layers, and if the saturation corresponding to a specified convolution layer is greater than a first preset threshold and the saturation difference of the output characteristic of the specified convolution layer and the output characteristic of the adjacent convolution layer is less than a second preset threshold, storing the candidate countermeasure sample image output by the specified convolution layer; compressing the disturbance in the candidate countermeasure sample image according to a preset proportion, inputting the compressed candidate countermeasure sample image into a semantic segmentation network model, attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity, performing iterative computation until the iteration times reach a specified number, and storing the corresponding candidate countermeasure sample image of each iterative computation; one of the candidate confrontation sample images is selected as a confrontation sample image.
In an optional embodiment of the present application, the second obtaining module 502 is specifically configured to: and adding the disturbance in each candidate confrontation sample image into the test sample image in sequence, inputting the test sample image added with different disturbances into the semantic segmentation network model in sequence, calculating the deception rate of the disturbance in each candidate confrontation sample image, and selecting the candidate confrontation sample image corresponding to the disturbance with the highest deception rate as the confrontation sample image.
In one embodiment, the second obtaining module 502 is specifically configured to: after a candidate countermeasure sample image is obtained through each iterative calculation, adding the disturbance in the obtained candidate countermeasure sample image into a test sample image, inputting the test sample image added with the disturbance into a semantic segmentation network model, and calculating the deception rate of the disturbance in the candidate countermeasure sample image; and if the cheating rate of the disturbance of the candidate countermeasure sample image obtained by the iterative calculation for the continuously specified times is smaller than the cheating rate of the disturbance of the candidate countermeasure sample image obtained by the previous iterative calculation, stopping the iterative calculation, and selecting the candidate countermeasure sample image corresponding to the disturbance with the highest cheating rate as the countermeasure sample image.
In one embodiment, the second obtaining module 502 is specifically configured to: the evaluation index of semantic segmentation is used to calculate the fraud rate of the candidate against the disturbance in the sample image.
For specific limitations of the image recognition device, reference may be made to the above limitations of the image recognition method, which are not described herein again. The modules in the image recognition device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, the internal structure of which may be as shown in FIG. 6. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an image recognition method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment of the present application, there is provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the following steps when executing the computer program: adding disturbance in the original sample image to obtain an initial confrontation sample image; inputting the initial countermeasure sample image into a semantic segmentation network model, and attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity to obtain a countermeasure sample image; carrying out optimization training on the semantic segmentation network model by utilizing a training sample set to obtain a semantic segmentation network optimization model, wherein the training sample set is a set added with countermeasure sample images according to a specified proportion; and identifying the image to be identified by utilizing a semantic segmentation network optimization model.
In one embodiment of the present application, the objective of the loss function in the semantic segmentation network model is to maximize an angle between a first vector and a second vector output by each layer of the semantic segmentation network model, where the first vector is a vector corresponding to the original sample image output by each layer of the semantic segmentation network model, and the second vector is a vector corresponding to the initial countermeasure sample image output by each layer of the semantic segmentation network model.
In one embodiment of the present application, the loss function in the semantic segmentation network model is:
Figure BDA0003325536460000201
Figure BDA0003325536460000202
Such than||δ||<ξ
wherein K is the range of the attacked layer number, theta is the included angle between the maximized original output and the confrontation output of each layer, xi is infinitesimal quantity, l is the minimum quantityi(x) For semantically segmenting the output of the i-th layer of the network model,/i(x + δ) is the output of the ith layer of the semantic segmentation network model with perturbation added to the data matrix of the original sample image, | | li(x+δ)||2For the second paradigm of a matrix formed after adding perturbations to the original sample image data matrix, | | li(x)||2In a second paradigm for the original sample image data matrix, δ is the countermeasure disturbance.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: inputting the initial countermeasure sample image into a semantic segmentation network model, attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity, calculating the saturation of the output characteristic of each convolution layer in the semantic segmentation network model and the saturation difference of the output characteristics of two adjacent convolution layers, and if the saturation corresponding to a specified convolution layer is greater than a first preset threshold and the saturation difference of the output characteristic of the specified convolution layer and the output characteristic of the adjacent convolution layer is less than a second preset threshold, storing the candidate countermeasure sample image output by the specified convolution layer; compressing the disturbance in the candidate countermeasure sample image according to a preset proportion, inputting the compressed candidate countermeasure sample image into a semantic segmentation network model, attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity, performing iterative computation until the iteration times reach a specified number, and storing the corresponding candidate countermeasure sample image of each iterative computation; one of the candidate confrontation sample images is selected as a confrontation sample image.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: and adding the disturbance in each candidate confrontation sample image into the test sample image in sequence, inputting the test sample image added with different disturbances into the semantic segmentation network model in sequence, calculating the deception rate of the disturbance in each candidate confrontation sample image, and selecting the candidate confrontation sample image corresponding to the disturbance with the highest deception rate as the confrontation sample image.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: after a candidate countermeasure sample image is obtained through each iterative calculation, adding the disturbance in the obtained candidate countermeasure sample image into a test sample image, inputting the test sample image added with the disturbance into a semantic segmentation network model, and calculating the deception rate of the disturbance in the candidate countermeasure sample image; and if the cheating rate of the disturbance of the candidate countermeasure sample image obtained by the iterative calculation for the continuously specified times is smaller than the cheating rate of the disturbance of the candidate countermeasure sample image obtained by the previous iterative calculation, stopping the iterative calculation, and selecting the candidate countermeasure sample image corresponding to the disturbance with the highest cheating rate as the countermeasure sample image.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: the evaluation index of semantic segmentation is used to calculate the fraud rate of the candidate against the disturbance in the sample image.
The implementation principle and technical effect of the computer device provided by the embodiment of the present application are similar to those of the method embodiment described above, and are not described herein again.
In an embodiment of the application, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of:
adding disturbance in the original sample image to obtain an initial confrontation sample image; inputting the initial countermeasure sample image into a semantic segmentation network model, and attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity to obtain a countermeasure sample image; carrying out optimization training on the semantic segmentation network model by utilizing a training sample set to obtain a semantic segmentation network optimization model, wherein the training sample set is a set added with countermeasure sample images according to a specified proportion; and identifying the image to be identified by utilizing a semantic segmentation network optimization model.
In one embodiment of the present application, the objective of the loss function in the semantic segmentation network model is to maximize an angle between a first vector and a second vector output by each layer of the semantic segmentation network model, where the first vector is a vector corresponding to the original sample image output by each layer of the semantic segmentation network model, and the second vector is a vector corresponding to the initial countermeasure sample image output by each layer of the semantic segmentation network model.
In one embodiment of the present application, the loss function in the semantic segmentation network model is:
Figure BDA0003325536460000221
Figure BDA0003325536460000222
Such than||δ||<ξ
wherein K is the range of the attacked layer number, theta is the included angle between the maximized original output and the confrontation output of each layer, xi is infinitesimal quantity, l is the minimum quantityi(x) For semantically segmenting the output of the i-th layer of the network model,/i(x + δ) is the output of the ith layer of the semantic segmentation network model with perturbation added to the data matrix of the original sample image, | | li(x+δ)||2For the second paradigm of a matrix formed after adding perturbations to the original sample image data matrix, | | li(x)||2In a second paradigm for the original sample image data matrix, δ is the countermeasure disturbance.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: inputting the initial countermeasure sample image into a semantic segmentation network model, attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity, calculating the saturation of the output characteristic of each convolution layer in the semantic segmentation network model and the saturation difference of the output characteristics of two adjacent convolution layers, and if the saturation corresponding to a specified convolution layer is greater than a first preset threshold and the saturation difference of the output characteristic of the specified convolution layer and the output characteristic of the adjacent convolution layer is less than a second preset threshold, storing the candidate countermeasure sample image output by the specified convolution layer; compressing the disturbance in the candidate countermeasure sample image according to a preset proportion, inputting the compressed candidate countermeasure sample image into a semantic segmentation network model, attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity, performing iterative computation until the iteration times reach a specified number, and storing the corresponding candidate countermeasure sample image of each iterative computation; one of the candidate confrontation sample images is selected as a confrontation sample image.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: and adding the disturbance in each candidate confrontation sample image into the test sample image in sequence, inputting the test sample image added with different disturbances into the semantic segmentation network model in sequence, calculating the deception rate of the disturbance in each candidate confrontation sample image, and selecting the candidate confrontation sample image corresponding to the disturbance with the highest deception rate as the confrontation sample image.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: after a candidate countermeasure sample image is obtained through each iterative calculation, adding the disturbance in the obtained candidate countermeasure sample image into a test sample image, inputting the test sample image added with the disturbance into a semantic segmentation network model, and calculating the deception rate of the disturbance in the candidate countermeasure sample image; and if the cheating rate of the disturbance of the candidate countermeasure sample image obtained by the iterative calculation for the continuously specified times is smaller than the cheating rate of the disturbance of the candidate countermeasure sample image obtained by the previous iterative calculation, stopping the iterative calculation, and selecting the candidate countermeasure sample image corresponding to the disturbance with the highest cheating rate as the countermeasure sample image.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: the evaluation index of semantic segmentation is used to calculate the fraud rate of the candidate against the disturbance in the sample image.
The implementation principle and technical effect of the computer-readable storage medium provided by this embodiment are similar to those of the above-described method embodiment, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An image recognition method, characterized in that the method comprises:
adding disturbance in the original sample image to obtain an initial confrontation sample image;
inputting the initial countermeasure sample image into a semantic segmentation network model, and attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity so as to obtain a countermeasure sample image;
carrying out optimization training on the semantic segmentation network model by utilizing a training sample set to obtain a semantic segmentation network optimization model, wherein the training sample set is a set added with the confrontation sample images according to a specified proportion;
and identifying the image to be identified by utilizing the semantic segmentation network optimization model.
2. The method according to claim 1, wherein the objective of the loss function in the semantic segmentation network model is to maximize an angle between a first vector and a second vector output by each layer of the semantic segmentation network model, wherein the first vector is a vector corresponding to the original sample image output by each layer of the semantic segmentation network model, and the second vector is a vector corresponding to the initial confrontation sample image output by each layer of the semantic segmentation network model.
3. The method of claim 2, wherein the loss function in the semantic segmentation network model is:
Figure FDA0003325536450000011
Figure FDA0003325536450000012
Such than||δ||<ξ
wherein K is the range of the attacked layer number, theta is the included angle between the maximized original output and the confrontation output of each layer, xi is infinitesimal quantity, l is the minimum quantityi(x) For semantically segmenting the output of the i-th layer of the network model,/i(x + delta) is a semantic segmentation network model ith with perturbation added to the data matrix of the original sample imageOutput of layer, | li(x+δ)||2For the second paradigm of a matrix formed after adding perturbations to the original sample image data matrix, | | li(x)||2In a second paradigm for the original sample image data matrix, δ is the countermeasure disturbance.
4. The method of claim 1, wherein the inputting the initial countermeasure sample image to a semantic segmentation network model and attacking the semantic segmentation network model using a cosine similarity based feature spoofing algorithm to obtain a countermeasure sample image comprises:
inputting the initial countermeasure sample image into the semantic segmentation network model, attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity, calculating the saturation of the output characteristic of each convolution layer in the semantic segmentation network model and the saturation difference of the output characteristics of two adjacent convolution layers, and if the saturation corresponding to a specified convolution layer is greater than a first preset threshold and the saturation difference of the output characteristic of the specified convolution layer and the output characteristic of the adjacent convolution layer is less than a second preset threshold, storing a candidate countermeasure sample image output by the specified convolution layer;
compressing the disturbance in the candidate countermeasure sample image according to a preset proportion, inputting the compressed candidate countermeasure sample image into the semantic segmentation network model, attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity, performing iterative computation until the iteration times reach a specified number, and storing the candidate countermeasure sample image corresponding to each iterative computation;
selecting one of the candidate confrontation sample images as the confrontation sample image.
5. The method of claim 4, wherein said selecting one of said candidate confrontation sample images as said confrontation sample image comprises:
and sequentially adding the disturbance in each candidate confrontation sample image into the test sample image, sequentially inputting the test sample image added with different disturbances into the semantic segmentation network model, calculating the cheating rate of the disturbance in each candidate confrontation sample image, and selecting the candidate confrontation sample image corresponding to the disturbance with the highest cheating rate as the confrontation sample image.
6. The method of claim 4, wherein said selecting one of said candidate confrontation sample images as said confrontation sample image comprises:
after the candidate countermeasure sample image is obtained through each iterative calculation, adding the disturbance in the obtained candidate countermeasure sample image into the test sample image, inputting the test sample image added with the disturbance into the semantic segmentation network model, and calculating the deception rate of the disturbance in the candidate countermeasure sample image;
and if the cheating rate of the disturbance of the candidate countermeasure sample image obtained by the iterative calculation for the continuously specified times is smaller than the cheating rate of the disturbance of the candidate countermeasure sample image obtained by the previous iterative calculation, stopping the iterative calculation, and selecting the candidate countermeasure sample image corresponding to the disturbance with the highest cheating rate as the countermeasure sample image.
7. The method of claim 6, wherein said calculating a fraud rate for said candidate confrontation against perturbations in the sample image comprises:
calculating a fraud rate of the candidate against a disturbance in the sample image using the semantically segmented evaluation index.
8. An image recognition apparatus, characterized in that the apparatus comprises:
the first acquisition module is used for adding disturbance in the original sample image to obtain an initial confrontation sample image;
the second acquisition module is used for inputting the initial countermeasure sample image into a semantic segmentation network model and attacking the semantic segmentation network model by using a characteristic deception algorithm based on cosine similarity so as to obtain a countermeasure sample image;
the third acquisition module is used for carrying out optimization training on the semantic segmentation network model by utilizing a training sample set to obtain a semantic segmentation network optimization model, wherein the training sample set is a set added with the confrontation sample images according to a specified proportion;
and the recognition module is used for recognizing the image to be recognized by utilizing the semantic segmentation network optimization model.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202111261097.8A 2021-10-28 2021-10-28 Image recognition method and device, computer equipment and storage medium Pending CN113902959A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111261097.8A CN113902959A (en) 2021-10-28 2021-10-28 Image recognition method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111261097.8A CN113902959A (en) 2021-10-28 2021-10-28 Image recognition method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113902959A true CN113902959A (en) 2022-01-07

Family

ID=79026677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111261097.8A Pending CN113902959A (en) 2021-10-28 2021-10-28 Image recognition method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113902959A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114510715A (en) * 2022-01-14 2022-05-17 中国科学院软件研究所 Model functional safety testing method and device, storage medium and equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114510715A (en) * 2022-01-14 2022-05-17 中国科学院软件研究所 Model functional safety testing method and device, storage medium and equipment

Similar Documents

Publication Publication Date Title
CN111860670B (en) Domain adaptive model training method, image detection method, device, equipment and medium
US20220058426A1 (en) Object recognition method and apparatus, electronic device, and readable storage medium
WO2022042123A1 (en) Image recognition model generation method and apparatus, computer device and storage medium
CN112926654B (en) Pre-labeling model training and certificate pre-labeling method, device, equipment and medium
Yu et al. Auto-fas: Searching lightweight networks for face anti-spoofing
CN111723865B (en) Method, apparatus and medium for evaluating performance of image recognition model and attack method
CN111310800B (en) Image classification model generation method, device, computer equipment and storage medium
CN111292377B (en) Target detection method, device, computer equipment and storage medium
CN112232426A (en) Training method, device and equipment of target detection model and readable storage medium
CN114677565B (en) Training method and image processing method and device for feature extraction network
CN111898735A (en) Distillation learning method, distillation learning device, computer equipment and storage medium
CN112232397A (en) Knowledge distillation method and device of image classification model and computer equipment
CN115496144A (en) Power distribution network operation scene determining method and device, computer equipment and storage medium
CN113065593A (en) Model training method and device, computer equipment and storage medium
CN115618008A (en) Account state model construction method and device, computer equipment and storage medium
CN115439708A (en) Image data processing method and device
CN113902959A (en) Image recognition method and device, computer equipment and storage medium
CN113076876B (en) Face spoofing detection method and system based on three-dimensional structure supervision and confidence weighting
CN113780363A (en) Countermeasure sample defense method, system, computer and medium
CN116383814B (en) Neural network model back door detection method and system
CN113344065A (en) Image processing method, device and equipment
CN116824330A (en) Small sample cross-domain target detection method based on deep learning
CN113516182B (en) Visual question-answering model training and visual question-answering method and device
CN112446428B (en) Image data processing method and device
CN115273202A (en) Face comparison method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination