WO2021218471A1 - Neural network for image processing and related device - Google Patents

Neural network for image processing and related device Download PDF

Info

Publication number
WO2021218471A1
WO2021218471A1 PCT/CN2021/081238 CN2021081238W WO2021218471A1 WO 2021218471 A1 WO2021218471 A1 WO 2021218471A1 CN 2021081238 W CN2021081238 W CN 2021081238W WO 2021218471 A1 WO2021218471 A1 WO 2021218471A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
classification
image
category
feature extraction
Prior art date
Application number
PCT/CN2021/081238
Other languages
French (fr)
Chinese (zh)
Inventor
王一飞
刘扶芮
李震国
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021218471A1 publication Critical patent/WO2021218471A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a neural network and related equipment for image processing.
  • Artificial Intelligence is the use of computers or computer-controlled machines to simulate, extend and expand human intelligence. Artificial intelligence includes studying the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. At present, image processing based on deep learning neural networks is a common application of artificial intelligence.
  • adversarial training that is, adding the adversarial image and the correct label corresponding to the adversarial image to the training data set to train the neural network, thereby improving the robustness of the neural network to the adversarial image.
  • Robustness means that the neural network can still accurately recognize the confrontation image.
  • the embodiments of the application provide a neural network and related equipment for image processing.
  • the trained first feature extraction network and the trained second feature extraction network can respectively extract robust and non-robust representations in input images. Representation, which not only avoids the mixing of the two and reduces the robustness, but also retains the robust and non-robust representations in the input image at the same time, thereby avoiding the decrease in accuracy and improving the robustness of the neural network at the same time And accuracy.
  • an embodiment of the present application provides a neural network training method, which can be used in the image processing field of the artificial intelligence field.
  • the training device inputs the confrontation image into the first feature extraction network and the second feature extraction network, respectively, to obtain a first robust representation generated by the first feature extraction network and a first non-robust representation generated by the second feature extraction network.
  • the counter image is an image that has undergone disturbance processing.
  • Disturbance processing refers to adjusting the pixel values of pixels in the original image based on the original image to obtain a disturbed image. For the human eye, it is usually It is difficult to distinguish the difference between the confrontation image and the original image.
  • Both the first robust representation and the second robust representation include feature information extracted from the confrontation image.
  • Robust representation refers to features that are not sensitive to disturbances, the classification category corresponding to the robust representation extracted from the original image, and the robust representation extracted from the confrontational image corresponding to the original image corresponds to The classification categories are consistent; the non-robust representation refers to the features that are sensitive to disturbances.
  • the classification category corresponding to the non-robust representation extracted from the original image is the same as the non-robust representation extracted from the counter image corresponding to the original image.
  • Robust means that the corresponding classification categories are inconsistent.
  • the feature information included in the robust representation is similar to the features utilized by the human eye.
  • the feature information included in the non-robust representation cannot be understood by the human eye, and the non-robust representation is noise to the human eye.
  • the training device inputs the first robust representation into the classification network, and obtains the first classification category output by the classification network.
  • the first classification category is the classification category for the object in the confrontation image; the first non-robust representation is input into the classification network to obtain the classification network.
  • the output second classification category, the second classification category is the classification category of the object in the confrontation image.
  • the training device performs iterative training on the first feature extraction network and the second feature extraction network according to the first loss function until the convergence condition is met, and outputs the trained first feature extraction network and the trained second feature extraction network.
  • the first loss function is used to represent the similarity between the first category and the first label category, and is used to represent the similarity between the second classification category and the second label category, and the first loss function may specifically be cross Entropy loss function or maximum interval loss function.
  • the first label category is the correct category corresponding to the confrontation image
  • the second label category is the error category corresponding to the confrontation image
  • the first label category includes the correct classification of one or more objects in the confrontation image
  • the second label category is the correct category in the confrontation image.
  • both the first label category and the second label category are used as supervision data in the training phase.
  • the convergence condition may be the convergence condition of the first loss function, or it may be that the number of iterations of training reaches a preset number.
  • the neural network includes a first feature extraction network and a second feature extraction network
  • the confrontation image is input into the first feature extraction network and the second feature extraction network, respectively, to obtain the first feature extraction network generated by the first feature extraction network.
  • the first non-robust representation generated by the robust representation and the second feature extraction network, and then the first robust representation is input into the classification network to obtain the first classification category output by the classification network, and the first non-robust representation is input into the classification network ,
  • the purpose of the first loss function is to bring the first classification category closer to the correctness of the confrontation image.
  • the similarity between the categories, and the similarity between the second classification category and the wrong category of the confrontation image that is, the purpose of training is to extract the robust representation in the input image through the first feature extraction network, and through the second The feature extraction network extracts the non-robust representation in the input image.
  • the robust representation and the non-robust representation in the input image are extracted through the first feature extraction network and the second feature extraction network respectively, which avoids the mixing of the two and leads to
  • the reduction in robustness can also retain both the robust representation and the non-robust representation in the input image, thereby avoiding the reduction in accuracy, and improving the robustness and accuracy of the neural network at the same time.
  • the method further includes: the training device inputs the original image into the first feature extraction network and the second feature extraction network, respectively, to obtain the second robust representation and the first feature extraction network generated by the first feature extraction network.
  • the second non-robust representation generated by the two feature extraction network refers to an image that has not undergone perturbation processing, or it can also be an image directly collected.
  • the training device combines the second robust representation and the second non-robust representation to obtain the combined first representation, and inputs the combined first representation into the classification network to perform classification operations based on the combined first representation through the classification network , Get the third classification category output by the classification network.
  • the combination method includes one or more of the following: splicing, addition, fusion, and multiplication.
  • the training device performs iterative training on the first feature extraction network and the second feature extraction network according to the first loss function until the convergence condition is met, which may include: the training device extracts the first feature according to the first loss function and the second loss function
  • the network and the second feature extraction network are iteratively trained until the convergence condition is met.
  • the second loss function is used to indicate the similarity between the third classification category and the third label category, and the second loss function may specifically be a cross-entropy loss function or a maximum interval loss function.
  • the third label category is the correct category corresponding to the original image, and the third label category is the correct classification of one or more objects in the original image, which can include one or more classification categories, which are used as supervision data in the training phase of.
  • the method may further include: training the device to input the original image into the first feature extraction network to obtain a second robust representation generated by the first feature extraction network. After that, the training device inputs the second robust representation into the classification network to perform the classification operation according to the second robust representation through the classification network to obtain the fourth classification category output by the classification network.
  • the fourth classification category includes one or more of the original images. The category of the object.
  • the training device may combine the second robust representation with a first constant tensor to obtain the combined third representation, and The combined third representation is input to the classification network, and the classification operation is performed by the classification network according to the combined third representation to obtain the fourth classification category output by the classification network.
  • the training device performs iterative training on the first feature extraction network and the second feature extraction network according to the first loss function until the convergence condition is met, which may include: the training device extracts the first feature according to the first loss function and the third loss function
  • the network and the second feature extraction network are iteratively trained until the convergence condition is met.
  • the third loss function is used to indicate the similarity between the fourth classification category and the third label category, and the third loss function may specifically be a cross-entropy loss function or a maximum interval loss function.
  • the third label category is the correct category corresponding to the original image.
  • the adversarial images are used to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also natural images are used to train the robust representation extraction capabilities of the first feature extraction network to further improve the post-training The accuracy of the first feature extraction network.
  • the length of the first constant tensor is the same as the length of the second non-robust representation, and the front and back positions of the second robust representation and the first constant tensor can be the same as those of the second robust representation.
  • the representation corresponds to the front and back positions of the second non-robust representation. If the second robust representation is in the front and the second non-robust representation in the combined first representation, then the combined third representation is the second robust Represents the front and the first constant tensor; if the second robust non-representation is the front and the second robust representation is the back of the combined first representation, then the first constant tensor is the combined third representation The first and second robust representation is second.
  • the method may further include: training the device to input the original image into the second feature extraction network to obtain a second non-robust representation generated by the second feature extraction network. After that, the training device inputs the second non-robust representation into the classification network to perform a classification operation according to the second non-robust representation through the classification network to obtain the fifth classification category output by the classification network.
  • the fifth classification category includes one or Multiple object categories.
  • the training device may combine the second robust representation with a second constant tensor to obtain the combined fourth representation,
  • the combined fourth representation is input into the classification network to perform a classification operation according to the combined fourth representation through the classification network to obtain the fourth classification category output by the classification network.
  • the training device performs iterative training on the first feature extraction network and the second feature extraction network according to the first loss function until the convergence condition is met, which may include: the training device extracts the first feature according to the first loss function and the fourth loss function
  • the network and the second feature extraction network are iteratively trained until the convergence condition is met.
  • the fourth loss function is used to indicate the similarity between the fifth classification category and the third label category, and the fourth loss function may specifically be a cross-entropy loss function or a maximum interval loss function.
  • the third label category is the correct category corresponding to the original image.
  • the adversarial images are used to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also natural images are used to train the second feature extraction network's ability to extract non-robust representations to further improve training.
  • the accuracy of the second feature extraction network is used to train the second feature extraction network's ability to extract non-robust representations to further improve training.
  • the length of the second constant tensor is the same as the length of the second robust representation, and the front and back positions of the second non-robust representation and the second constant tensor can be the same as the second robust representation.
  • the representation corresponds to the front and rear positions of the second non-robust representation. If the second robust representation is in the front and the second non-robust representation in the combined first representation, the second constant is the second constant in the combined fourth representation.
  • the quantity is first and the second non-robust representation is behind; if the second robust non-representation is first and the second robust non-representation is behind in the combined first representation, the second non-robust representation is the second non-robust representation in the combined fourth representation.
  • the bars indicate the front and the second constant tensor behind.
  • the training device inputs the original image into the first feature extraction network and the second feature extraction network, respectively, to obtain the second robust representation generated by the first feature extraction network and the second feature extraction network The second non-robust representation generated.
  • the training device combines the second robust representation and the second robust representation to obtain the combined first representation, and inputs the combined first representation into the classification network to perform classification operations according to the combined first representation through the classification network, Obtain the third classification category output by the classification network.
  • the training device inputs the second robust representation into the classification network to perform a classification operation according to the second robust representation through the classification network to obtain the fourth classification category output by the classification network.
  • the training device inputs the second non-robust representation into the classification network to perform a classification operation according to the second non-robust representation through the classification network to obtain the fifth classification category output by the classification network.
  • the training device performs iterative training on the first feature extraction network and the second feature extraction network according to the first loss function until the convergence condition is met, which may include: the training device extracts the first feature according to the first loss function and the fifth loss function
  • the network and the second feature extraction network are trained iteratively until the convergence condition is met.
  • the fifth loss function is used to represent the similarity between the fourth classification category and the third label category, and is used to represent the fifth classification category and the first
  • the similarity between the three label categories is used to indicate the similarity between the sixth classification category and the third label category, and the fifth loss function may specifically be a cross-entropy loss function or a maximum interval loss function.
  • the third label category is the correct category corresponding to the original image.
  • both the trained first feature extraction network and the second feature extraction network can accurately extract robust and non-robust representations, which expands the application scenarios of this solution.
  • the method may further include: the training device generates a first gradient according to the function value of the second loss function; perturbs the original image according to the first gradient to generate a confrontation image, and
  • the third label category is determined as the first label category.
  • the training device may generate the function value of the second loss function according to the third classification category and the third label category, and generate the first gradient according to the function value of the second loss function, and bring the first gradient into the preset In the function, the preset coefficient is then multiplied to obtain the disturbance, and then the obtained disturbance is superimposed with the original image to generate a confrontation image.
  • the first gradient is generated according to the similarity between the third classification category and the third annotation category, and the original image is perturbed according to the first gradient, so that the perturbation processing is more pertinent and helps speed up the first
  • the training process of the feature extraction network and the second feature extraction network improves the efficiency of the training process.
  • the method may further include: training the device to generate a second gradient according to the function value of the third loss function; performing perturbation processing on the original image according to the second gradient to generate a confrontation image, and
  • the third label category is determined as the first label category.
  • the training device may generate the function value of the third loss function according to the fourth classification category and the third label category, and generate the second gradient according to the function value of the third loss function, and bring the second gradient into the preset In the function, the preset coefficient is then multiplied to obtain the disturbance, and then the obtained disturbance is superimposed with the original image to generate a confrontation image.
  • the original image is disturbed according to the similarity between the fourth classification category output by the second robust representation and the third annotation category according to the classification network, so that the disturbance processing and the first feature extraction network have a better relationship with each other. Pertinence is conducive to improving the feature extraction capability of the first feature extraction network for robust representation.
  • the method may further include: training the device to generate a third gradient according to the function value of the fourth loss function; performing perturbation processing on the original image according to the third gradient to generate a confrontation image, and
  • the third label category is determined as the first label category.
  • the training device may generate the function value of the fourth loss function according to the fifth classification category and the third label category, and generate the third gradient according to the function value of the fourth loss function, and bring the third gradient into the preset In the function, the preset coefficient is then multiplied to obtain the disturbance, and then the obtained disturbance is superimposed with the original image to generate a confrontation image.
  • the original image is perturbed according to the similarity between the fifth classification category output by the first feature extraction network and the third annotation category according to the second non-robust representation of the first feature extraction network, so that the perturbation process is consistent with the second feature
  • the extraction networks are more targeted, which helps to improve the feature extraction ability of the first feature extraction network for non-robust representations.
  • the method may further include: the training device combines the first robust representation and the first non-robust representation to obtain a combined second representation, and inputting the combined second representation
  • the classification network obtains the sixth classification category output by the classification network, and the sixth classification category is the category of the object in the confrontation image.
  • the training device inputs the first robust representation into the classification network to obtain the first classification category output by the classification network, and inputs the first non-robust representation into the classification network to obtain the second classification category output by the classification network, which may include: the training device is in the first category
  • the first robust representation is input to the classification network to obtain the first classification category output by the classification network
  • the first non-robust representation is input to the classification network to obtain the first classification output of the classification network.
  • Two classification categories In this implementation, if the sixth classification category is the same as the first annotation category, it proves that the disturbance of the image after the disturbance is too slight.
  • the processing method is the same as that of the natural image.
  • the processing method of is not much different, and the purpose of training here is to enhance the ability of the first feature extraction network and the second feature extraction network to separate robust and non-robust representations from images with larger disturbances, only in the sixth category When the category is different from the first labeled category, subsequent training operations are performed to improve the efficiency of the training process.
  • the method may further include: when the sixth classification category is different from the first label category, the training device determines the sixth classification category as the second label category.
  • the method may further include: when the sixth classification category is different from the first label category, the training device determines the sixth classification category as the second label category.
  • the first feature extraction network is a convolutional neural network or a residual neural network
  • the second feature extraction network is a convolutional neural network or a residual neural network.
  • the embodiments of the present application provide an image processing network, which can be used in the image processing field of the artificial intelligence field.
  • the image processing network includes a first feature extraction network, a second feature extraction network, and a feature processing network.
  • the first feature extraction network is used to receive the inputted first image and generate a robust representation corresponding to the first image. Representation refers to features that are not sensitive to disturbances.
  • the second feature extraction network is used to receive the input first image, and generate a non-robust representation corresponding to the first image.
  • the non-robust representation refers to a feature that is sensitive to disturbances.
  • the feature processing network is used to obtain a robust representation and a non-robust representation to output the first processing result corresponding to the first image.
  • the specific implementation mode of the feature processing network and the specific expression mode of the first processing result are related to the function of the entire image processing network.
  • the function of the image processing network is image classification
  • the feature processing network is a classification network
  • the first processing result is used to indicate the classification category of the entire image.
  • the function of the image processing network is image recognition
  • the feature processing network may be a recognition network
  • the first processing result is used to indicate the content recognized from the image, such as the text content in the image.
  • the function of the image processing network is image segmentation
  • the feature processing network can include a classification network, which is used to generate the classification category of each pixel in the image, and then use the classification category of each pixel in the image to segment the image ,
  • the first processing result is the segmented image.
  • the first feature extraction network and the second feature extraction network are used to extract the robust representation and the non-robust representation in the input image, which not only avoids the mixing of the two and leads to the reduction of robustness, but also retains them at the same time.
  • the robust representation and non-robust representation in the input image avoids the decrease in accuracy, and improves the robustness and accuracy of the neural network at the same time.
  • the feature processing network can be specifically used for:
  • the robust representation and the non-robust representation are combined, and the output corresponds to the first image according to the combined representation
  • the first processing result corresponding to the first image is output
  • the first case and the second case are different cases.
  • the first case may refer to the case where the accuracy of the first processing result is high
  • the second case may refer to the case where the robustness of the first processing result is high
  • the second case It refers to the situation where the probability of the input image being the disturbed image is very high.
  • the image processing network includes both a robust path and a standard path, and the user can flexibly choose which path to use according to the actual situation, which expands the application scenarios of this solution and improves the implementation flexibility of this solution.
  • the feature processing network may also be used to output the first processing result corresponding to the first image according to the non-robust representation in the third case.
  • the feature processing network is specifically represented as a classification network
  • the classification network can be specifically used to: perform a classification operation according to the combined representation, and output a classification category corresponding to the first image; or, according to Robust means performing a classification operation and outputting a classification category corresponding to the first image; or performing a classification operation according to a non-robust representation and outputting a classification category corresponding to the first image.
  • the provided image processing method falls into the specific application scenario of image classification, which improves the degree of integration with the application scenario.
  • the function of the image processing network is to determine whether the first image is an original image or a confrontational image, the first processing result indicates that the first image is an original image, or the first processing result indicates The first image is the disturbed image.
  • the feature information extracted by the first feature extraction network and the second extraction network be used to obtain the processing result corresponding to the object in the image, but also the processing result corresponding to the entire image can be obtained, that is, for judgment Whether the image is the original image or the disturbed image expands the application scenarios of this solution.
  • the feature processing network can be used to: generate the first classification category corresponding to the first image according to the robust representation; generate the first classification corresponding to the first image according to the non-robust representation The second classification category; when the first classification category is consistent with the second classification category, the output of the first processing result indicates that the first image is the original image; when the first classification category is inconsistent with the second classification category, output The first processing result of indicates that the first image is a disturbed image.
  • the method is simple and the operability is strong.
  • the feature processing network can be specifically used to combine robust representations and non-robust representations, and perform detection operations based on the combined representations, so as to output images corresponding to the first image
  • the detection result, the first processing result includes the detection result.
  • the detection result may indicate whether the first image is the original image or the disturbed image; in another case, the detection result may also indicate which objects are included in the first image, that is, it may indicate the first image
  • the object type of the at least one object included in the at least one object optionally, the detection result may also include the position information of each object in the aforementioned at least one object.
  • another implementation manner of determining whether the first image is an original image or a confrontation image is provided, which enhances the implementation flexibility of this solution.
  • the image processing network is one or more of the following: an image classification network, an image recognition network, an image segmentation network, or an image detection network.
  • an image classification network an image classification network
  • an image recognition network an image segmentation network
  • an image detection network an image detection network
  • the feature processing network includes a perceptron.
  • the first feature extraction network is a convolutional neural network or a residual neural network
  • the second feature extraction network is a convolutional neural network or a residual neural network
  • an embodiment of the present application provides a neural network training device, which can be used in the image processing field of the artificial intelligence field.
  • the training device of the neural network may include an input module and a training module.
  • the input module is used to input the confrontation image into the first feature extraction network and the second feature extraction network to obtain the first robust representation generated by the first feature extraction network and the first non-robust representation generated by the second feature extraction network.
  • Representation where the adversarial image is an image that has undergone disturbance processing on the original image, robust representation refers to features that are not sensitive to disturbance, and non-robust representation refers to features that are sensitive to disturbance.
  • the input module is also used to input the first robust representation into the classification network to obtain the first classification category output by the classification network, and input the first non-robust representation into the classification network to obtain the second classification category output by the classification network.
  • the training module is used to iteratively train the first feature extraction network and the second feature extraction network according to the first loss function until the convergence condition is met, and output the trained first feature extraction network and the trained second feature extraction network .
  • the first loss function is used to represent the similarity between the first category and the first label category, and is used to represent the similarity between the second classification category and the second label category, and the first label category is the same as the adversarial image Corresponding to the correct category, the second label category is the wrong category corresponding to the confrontation image.
  • the neural network training device including various modules can also be used to implement the steps in the various possible implementations of the first aspect.
  • the various possibilities of the third aspect For the specific implementation manners of certain steps in the implementation manners, and the beneficial effects brought by each possible implementation manner, reference may be made to the descriptions in the various possible implementation manners in the first aspect, which will not be repeated here.
  • an embodiment of the present application provides an image processing method, which can be used in the image processing field of the artificial intelligence field.
  • the method may include: the execution device inputs the first image into the first feature extraction network to obtain a robust representation corresponding to the first image generated by the first feature extraction network, and the robust representation refers to features that are not sensitive to disturbance; the execution device The first image is input into the second feature extraction network to obtain a non-robust representation corresponding to the first image generated by the second feature extraction network.
  • the non-robust representation refers to the feature sensitive to disturbance; the execution device uses the feature processing network to According to the robust representation and the non-robust representation, the first processing result corresponding to the first image is output, and the first feature extraction network, the second feature extraction network, and the feature processing network belong to the same image processing network.
  • the execution device may also be used to implement steps in the various possible implementation manners of the second aspect.
  • steps in the various possible implementation manners of the second aspect For the fourth aspect of the embodiments of the present application and some steps in the various possible implementation manners of the fourth aspect, for the specific implementation manner of and the beneficial effects brought by each possible implementation manner, reference may be made to the descriptions in the various possible implementation manners in the second aspect, which will not be repeated here.
  • an embodiment of the present application provides a training device, which may include a processor, the processor is coupled to a memory, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the above-mentioned first aspect is implemented The training method of the neural network.
  • a training device which may include a processor, the processor is coupled to a memory, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the above-mentioned first aspect is implemented The training method of the neural network.
  • the steps executed by the training device in each possible implementation manner of the first aspect by the processor please refer to the first aspect for details, which will not be repeated here.
  • an embodiment of the present application provides an execution device, which may include a processor, the processor and a memory are coupled, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the foregoing second aspect is implemented
  • the steps performed by the image processing network for the steps executed by the image processing network in each possible implementation manner of the second aspect by the processor, please refer to the second aspect for details, which will not be repeated here.
  • the embodiments of the present application provide a computer-readable storage medium in which a computer program is stored, and when it runs on a computer, the computer executes the neural network described in the first aspect.
  • the training method of the network, or the computer executes the image processing method described in the fourth aspect.
  • an embodiment of the present application provides a circuit system, the circuit system includes a processing circuit configured to execute the neural network training method described in the first aspect above, or the processing circuit configuration To implement the image processing method described in the fourth aspect.
  • an embodiment of the present application provides a computer program that, when running on a computer, causes the computer to execute the neural network training method described in the first aspect, or causes the computer to execute the neural network training method described in the fourth aspect.
  • Image processing method when running on a computer, causes the computer to execute the neural network training method described in the first aspect, or causes the computer to execute the neural network training method described in the fourth aspect.
  • an embodiment of the present application provides a chip system that includes a processor for supporting training equipment or an image processing network to implement the functions involved in the above aspects, for example, sending or processing the functions involved in the above methods Data and/or information.
  • the chip system further includes a memory, and the memory is used to store necessary program instructions and data for the server or the communication device.
  • the chip system can be composed of chips, and can also include chips and other discrete devices.
  • FIG. 1 is a schematic diagram of a structure of an artificial intelligence main frame provided by an embodiment of this application;
  • FIG. 2 is a system architecture diagram of an image processing system provided by an embodiment of the application
  • FIG. 3 is a schematic flowchart of a neural network training method provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of a disturbance operation in a neural network training method provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of the robust representation and the non-robust representation after visualization processing in the neural network training method provided by the embodiment of the application;
  • FIG. 6 is a schematic flowchart of an image processing method provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of an image processing network in an image processing method provided by an embodiment of this application.
  • FIG. 8 is a schematic flowchart of an image processing method provided by an embodiment of the application.
  • FIG. 9 is a schematic structural diagram of a neural network training device provided by an embodiment of the application.
  • FIG. 10 is a schematic structural diagram of a neural network training device provided by an embodiment of the application.
  • FIG. 11 is a schematic structural diagram of a neural network training device provided by an embodiment of the application.
  • FIG. 12 is a schematic structural diagram of an execution device provided by an embodiment of this application.
  • FIG. 13 is a schematic structural diagram of a training device provided by an embodiment of the application.
  • FIG. 14 is a schematic diagram of a structure of a chip provided by an embodiment of the application.
  • the embodiments of the present application provide a neural network and related equipment for image processing.
  • the trained first feature extraction network and the trained second feature extraction network can respectively extract robust and non-robust representations in input images. Representation, which not only avoids the mixing of the two and reduces the robustness, but also retains the robust and non-robust representations in the input image at the same time, thereby avoiding the decrease in accuracy and improving the robustness of the neural network at the same time And accuracy.
  • Figure 1 shows a schematic diagram of the main framework of artificial intelligence.
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensing process of "data-information-knowledge-wisdom".
  • the "IT value chain” from the underlying infrastructure of human intelligence, information (providing and processing technology realization) to the industrial ecological process of the system, reflects the value that artificial intelligence brings to the information technology industry.
  • the infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform.
  • the smart chips include central processing unit (CPU), neural-network processing unit (NPU), and graphics processor (graphics).
  • the basic platform includes distributed computing framework and network related platforms Guarantee and support can include cloud storage and computing, interconnection networks, etc.
  • sensors communicate with the outside to obtain data, and these data are provided to the smart chip in the distributed computing system provided by the basic platform for calculation.
  • the data in the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as the Internet of Things data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, training, etc.
  • Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formal information to conduct machine thinking and solving problems based on reasoning control strategies.
  • the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, and usually provides functions such as classification, ranking, and prediction.
  • some general capabilities can be formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image Recognition and so on.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is an encapsulation of the overall solution of artificial intelligence, productizing intelligent information decision-making and realizing landing applications. Its application fields mainly include: intelligent terminals, intelligent manufacturing, Intelligent transportation, smart home, smart medical, smart security, autonomous driving, safe city, etc.
  • the embodiments of the present application can be mainly applied to the image processing scenarios in the above-mentioned various application fields.
  • the sensors on the automatic driving vehicle will transmit the original image to the automatic driving vehicle after collecting the original image.
  • the processor of the autonomous vehicle uses the image processing network to process the transmitted image. If the pixel value of the original image is not disturbed during the transmission of the original image, the processor is processing the original image; if During the transmission of the original image, if the pixel values in the original image are disturbed, the processor will process the confrontation image, that is, the image processed by the processor of the autonomous vehicle may contain both the original image and the confrontation image.
  • the smart terminal collects the original image
  • the original image is filled with light, filters, etc.
  • the original image is If the pixel value causes a disturbance, the image processed by the smart terminal through the neural network may be the disturbed image, that is, there are pitfalls in the smart terminal field.
  • You will have both the original image and the confrontation image. Among them, the difference between the confrontation image and the original image is not detectable by the human eye, but the confrontation image will greatly reduce the accuracy of the neural network.
  • the examples here are only for the convenience of understanding the application scenarios of the embodiments of the present application, and do not exhaustively list the application scenarios of the embodiments of the present application.
  • the embodiments of the present application can also be applied to scenarios of speech processing or text processing. In the embodiments of the present application, only scenarios applied to image processing are taken as an example for detailed introduction.
  • FIG. 2 is a system architecture diagram of the image processing system provided by the embodiment of the present application.
  • the image processing system 200 includes an execution device 210, a training device 220, a database 230 and a data storage system 240, and the execution device 210 includes a calculation module 211.
  • the database 230 stores a training data set.
  • the training data set includes multiple training images and the label classification of each training image.
  • the training device 220 generates a target model/rule 201 for the image and uses the database
  • the training data set in is iteratively trained on the target model/rule 201, and a mature target model/rule 201 is obtained.
  • the target model/rule 201 can be specifically represented as an image processing network.
  • the image processing network obtained by the training device 220 can be applied to different systems or devices.
  • the execution device 210 can call data, codes, etc. in the data storage system 240, and can also store data, instructions, etc. in the data storage system 240.
  • the data storage system 240 may be placed in the execution device 210, or the data storage system 240 may be an external memory relative to the execution device 210.
  • the calculation module 211 may process the image collected by the execution device 210 through the image processing network to obtain the processing result, and the specific expression form of the processing result is related to the function of the image processing network.
  • the "user" can directly interact with the execution device 210, that is, the execution device 210 and the client device are integrated in the same device.
  • FIG. 2 is only a schematic diagram of the architecture of the two image processing systems provided by the embodiment of the present invention, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the execution device 210 and the client device may be independent devices.
  • the execution device 210 is equipped with an input/output interface for data interaction with the client device, and the “user” can input/output data through the client device.
  • the output interface inputs the collected image, and the execution device 210 returns the image coordinates of the first point to the client device through the input/output interface.
  • the embodiment of the application provides a neural network training method and an image processing method, which are applied to the training phase and the inference phase, respectively, because the training phase and the inference phase in the image processing system provided by the embodiment of the application are
  • the specific implementation manner of is different.
  • the specific implementation process of the training phase and the inference phase of the embodiment of the present application will be described below.
  • the training phase refers to the process in which the training device 220 in FIG. 2 uses training data to perform a training operation.
  • FIG. 3 is a schematic flowchart of a neural network training method provided in an embodiment of the application.
  • the neural network training method provided in an embodiment of the application may include:
  • the training device obtains an original image and a third label category.
  • a training data set is configured on the training device, and the training data set may include an original image and a third label category corresponding to the original image.
  • the original image refers to an image that has not undergone perturbation processing, or it can also be an image directly collected.
  • the third label category is the correct classification corresponding to the original image, and the third label category is the correct classification of one or more objects in the original image, which can include one or more classification categories, which are used for the supervision data in the training phase of.
  • the corresponding third annotation category is panda; as another example, if an image includes a panda and a frog, then the corresponding third annotation category is panda and frog.
  • the examples given here are only to facilitate the understanding of this solution, and are not used to limit this solution.
  • Disturbance processing refers to slightly adjusting the pixel values of pixels in the original image based on the original image to obtain a perturbed image.
  • the perturbed image can also be called a confrontational image.
  • one disturbance process may be to adjust the pixel value of each pixel in the original image, or it may be the pixel value of only a part of the pixels in the original image.
  • the disturbance can be expressed as a two-dimensional matrix, and the size of the two-dimensional matrix is consistent with the size of the original image.
  • FIG. 4 is a schematic diagram of a disturbance operation in the neural network training method provided in an embodiment of the application.
  • A1 represents the natural image
  • A2 represents the disturbance
  • A3 represents the confrontation image.
  • the classification category obtained by inputting A1 into the image classification network is panda
  • the classification category obtained by inputting A3 into the image classification network is gibbons, so it should be understood that,
  • the example in FIG. 4 is only to facilitate the understanding of the concept of disturbance processing, and is not used to limit the solution.
  • S represents the restriction on the disturbance ⁇
  • represents the disturbance
  • p represents the p-norm of ⁇ , which can also be called the modulus length of ⁇
  • p can be any integer greater than or equal to 1, as an example,
  • p can be 2, or p can be infinite
  • is a fixed preset value.
  • the value of ⁇ can be 0.3, Or other numerical values, etc., it should be understood that this example is only for further understanding the concept of disturbance, and is not used to limit the solution.
  • the training device inputs the original image into the first feature extraction network to obtain a second robust representation generated by the first feature extraction network.
  • the training device after obtaining the original image, the training device inputs the original image into the first feature extraction network to obtain a second robust representation generated by the first feature extraction network.
  • the first feature extraction network is a convolutional neural network or a residual neural network.
  • the first feature extraction network may be a part used for feature extraction in Wide Residual Networks 34 (WRNS34); as another example, for example, the first feature extraction network may be a pre-activated residual In the part of neural network 18 (Pre-activated Residual Networks 18, PRNS 18) used for feature extraction, the first feature extraction network may also be expressed as other types of convolutional neural networks or residual neural networks, etc., which is not limited here.
  • Robust representation refers to the features that are not sensitive to disturbances among the features extracted from the image.
  • the classification category corresponding to the robust representation extracted from the original image is consistent with the classification category corresponding to the robust representation extracted from the disturbed image corresponding to the original image.
  • the second robust representation includes features that are not sensitive to disturbances among the features extracted from the original image.
  • the second robust representation can specifically be expressed as one-dimensional data, two-dimensional data, three-dimensional data, or higher-dimensional data, etc.; the length of the second robust representation can be 500, 800, 1000 or other lengths, etc., and the details are all here. Not limited.
  • the training device inputs the original image into the second feature extraction network to obtain a second non-robust representation generated by the second feature extraction network.
  • the training device after obtaining the original image, inputs the original image into the second feature extraction network to obtain a second non-robust representation generated by the second feature extraction network.
  • the second feature extraction network can also be a convolutional neural network or a residual neural network.
  • the second feature extraction network has similar functions to the first feature extraction network, except that the second feature extraction network after training is different from the first feature extraction network after training.
  • the weight parameters of a feature extraction network are different, so what is extracted through the first feature extraction network is a robust representation in the image, and what is extracted through the second feature extraction network is a non-robust representation in the image.
  • the specific manifestation of the second feature extraction network please refer to the above examples of the first feature extraction network, which will not be repeated here.
  • two specific implementation manners of the first feature extraction network and the second feature extraction network are provided, which improves the implementation flexibility of the solution.
  • the non-robust representation refers to the features that are sensitive to disturbance among the features extracted from the image.
  • the classification category corresponding to the non-robust representation extracted from the original image is inconsistent with the classification category corresponding to the non-robust representation extracted from the disturbed image corresponding to the original image.
  • the second non-robust representation includes features that are sensitive to disturbances among the features extracted from the original image.
  • the specific form and length of the second non-robust representation are similar to those of the second robust representation. Please refer to the above description. , Do not repeat it here.
  • FIG. 5 is a schematic diagram of the robust representation and the non-robust representation after visualization processing in the neural network training method provided by the embodiment of the application. .
  • B1 and B2 correspond to the same original image (original image 1)
  • B3 and B4 correspond to the same original image (original image 2)
  • the object in original image 1 is a squirrel
  • B1 is the pair extracted from original image 1.
  • the robust representation of is obtained after visualization processing
  • B2 is obtained after visualization processing of the non-robust representation extracted from the original image 1.
  • B1 As shown in Figure 5 by human eyes, there is a faint squirrel shape in B1, and B1 also carries There is the color of a squirrel (it cannot be shown because there is no color in the patent document), but the human eye cannot get any information from B2.
  • the object in the original image 2 is a ship
  • B3 is obtained by visualizing the robust representation extracted from the original image 2
  • B4 is obtained by visualizing the non-robust representation extracted from the original image 2.
  • B3 Viewing Figure 5 through the human eye, B3 has the shape of a ship faintly, and B3 also carries the color of the ship (the color cannot be shown in the patent document), and the human eye cannot obtain any information from B4. That is, the feature information included in the robust representation is similar to the feature used by the human eye.
  • Step 302 can be executed first, and then step 303; or step 303 can be executed first, and then step 302 can be executed; steps 302 and 302 can also be executed simultaneously. 303.
  • the training device combines the second robust representation and the second non-robust representation to obtain the combined first representation.
  • the training device after obtaining the second robust representation and the second non-robust representation, the training device will combine the second robust representation and the second non-robust representation to obtain the combined The first representation.
  • the way of combination includes, but is not limited to, conact, addition, fusion, and multiplication.
  • the training device inputs the combined first representation into the classification network to perform a classification operation according to the combined first representation through the classification network to obtain a third classification category output by the classification network.
  • the training device inputs the combined first representation into the classification network to perform the classification operation according to the combined first representation through the classification network to obtain the first representation output by the classification network.
  • the processing method of steps 304 and 305 may also be referred to as generating a third classification category through a standard path.
  • the classification network may include at least one perceptron, and the aforementioned perceptron includes at least two neural network layers, and specifically may be a double-layer fully connected perceptron.
  • the third classification category indicates the category of the object in the original image.
  • the training device inputs the second robust representation into the classification network, so as to perform a classification operation according to the second robust representation through the classification network to obtain a fourth classification category output by the classification network.
  • the training device may also combine the second robust representation with a first constant tensor (such as a vector with all zeros) to obtain the combined third representation, and combine
  • a first constant tensor such as a vector with all zeros
  • the classification network can use the second robust representation in the combined third representation Include feature information, perform a classification operation, and output a fourth classification category, which is the classification category of the object in the natural image.
  • step 306 may also be referred to as generating a fourth classification category through a robust path. Further, since the same classification network can be used in step 306 and step 305, in order to make the combined third representation consistent with the combined first representation format, the second robust representation and the first constant tensor Make a combination. If different classification networks are used in step 306 and step 305, the second robust representation can also be directly input into the classification network.
  • the specific implementation of the combination can be referred to the introduction in step 304, and the specific manifestation of the classification network can be referred to the introduction in step 305, which will not be repeated here.
  • the same classification network may be used as in step 305, or different classification networks may be used respectively.
  • the format of the combined third representation and the format of the combined second representation may be the same.
  • the first constant tensor refers to a tensor whose values remain unchanged in multiple trainings. It can be expressed as one-dimensional data, two-dimensional data, three-dimensional data, or higher-dimensional data, etc., in a first constant tensor The value of all constants can be the same or different.
  • the values of all constants in a first constant tensor can be 0, 1, 2, or other values, etc.
  • a first constant tensor can include 1, 2, 3, 5, 12 , 18, etc.
  • the length of the first constant tensor is the same as the length of the second non-robust representation.
  • the front and rear positions of the second robust representation and the first constant tensor may correspond to the front and rear positions of the second robust representation and the second non-robust representation.
  • step 304 the second robust means first and the second non-robust means behind
  • step 306 the second robust means first and the first constant tensor is behind
  • the representation is in front and the second robust representation is in the back
  • step 306 the first constant tensor is in the front and the second robust representation is in the back.
  • the training device inputs the second non-robust representation into the classification network, so as to perform a classification operation according to the second non-robust representation through the classification network to obtain a fifth classification category output by the classification network.
  • the training device may combine the second non-robust representation with a second constant tensor to obtain the combined fourth representation, and input the combined fourth representation into the classification network , To perform the classification operation according to the combined fourth representation through the classification network to obtain the fifth classification category output by the classification network.
  • the classification network uses the feature information included in the second non-robust representation in the combined fourth representation to perform a classification operation to output a fifth classification category, which is the classification category of the object in the natural image .
  • the processing method of step 307 may also be referred to as generating a fifth classification category through a non-robust path.
  • the specific implementation of the combination can be referred to the introduction in step 304, and the specific manifestation of the classification network can be referred to the introduction in step 305, which will not be repeated here.
  • the same classification network can be used as in steps 305 and 306, or different classification networks can be used respectively. If the same classification network is used in step 307 and steps 305 and 306, in order to ensure the consistency of the classification network in the data processing process, the second non-robust representation needs to be combined with the second constant tensor, and the combined fourth representation is The format must be consistent with the format of the combined first representation.
  • the second constant tensor please refer to the description of the first constant tensor.
  • the second constant tensor can be the same constant tensor as the first constant tensor, or it can be a different constant tensor. , This time there is no limitation.
  • the front and rear positions of the second non-robust representation and the second constant tensor may correspond to the front and rear positions of the second robust representation and the second non-robust representation.
  • step 304 If in step 304 the second robust representation is in front and the second non-robust representation is in the back, then in step 306 the second constant tensor is in front and the second non-robust representation is in the back; if the second robust representation is in step 304 If the non-representation is in front and the second robust is in the back, then in step 306, the second non-robust representation is in the front and the second constant tensor is in the back.
  • Steps 304 and 305 can be performed first, then step 306, and then step 307; or first Step 306, perform steps 304 and 305, and then perform step 307; you can also perform step 306, then perform step 307, and then perform steps 304 and 305, or you can perform steps 304 and 305, then perform step 307, and then perform Step 306, etc., the sequence of steps 304 and 305, step 306, and step 307 can be arranged arbitrarily, which is not exhaustive here. In addition, steps 304 and 305, step 306, and step 307 can also be performed at the same time.
  • the training device obtains the confrontation image and the first label category.
  • the training device obtains the confrontation image and the first label category corresponding to the confrontation image.
  • the counter image is an image that has undergone disturbance processing, and may also be referred to as a post-disturbance image.
  • the first label category is the correct category corresponding to the confrontation image, which is the correct classification of one or more objects in the confrontation image, which may include one or more classification categories, and is used for supervising data in the training phase.
  • the first annotation category has a similar meaning to the above-mentioned third annotation category. The difference is that the first annotation category is for adversarial images, and the third annotation category is for natural images.
  • the third annotation category please refer to step 301. An example of labeling categories will not be repeated this time.
  • the training device may perform perturbation processing on the basis of the above-mentioned natural image to obtain a counter image.
  • the training device may obtain the aforementioned disturbance based on the gradient of the standard path, the robust path, or the non-robust path in each training process, and then obtain the disturbance image.
  • the training device may also not rely on the aforementioned gradient to obtain the aforementioned perturbation.
  • the method of generating the confrontation image is different, and the foregoing two implementation methods are respectively described below.
  • the confrontation image is generated based on gradient
  • the above-mentioned gradient can be divided into the first gradient of the standard path, the second gradient of the robust path, and the third gradient of the non-robust path, which will be introduced separately below.
  • step 308 may include: the training device generates a first gradient according to the function value of the second loss function, performs perturbation processing on the original image according to the first gradient to generate a confrontation image, and determines the third label category as the first One label category.
  • the second loss function is used to indicate the similarity between the third classification category and the third label category, and the second loss function can specifically be a cross-entropy loss function, a maximum margin loss function (max-margin loss) or other types of Loss functions, etc., are not limited this time.
  • the first gradient is generated according to the similarity between the third classification category and the third annotation category, and the original image is disturbed according to the first gradient, so that the disturbance processing is more targeted, which is beneficial to speed up the first gradient.
  • the training process of the first feature extraction network and the second feature extraction network improves the efficiency of the training process.
  • the training device may generate the function value of the second loss function according to the third classification category and the third label category, and generate the first gradient according to the function value of the second loss function, and convert the first The gradient is brought into the preset function, and then multiplied by the preset coefficient to obtain the perturbation, and then the obtained perturbation is superimposed with the original image to generate a confrontation image.
  • the preset function can be a sign function, an identity function or other functions; the value of the preset coefficient can be 0.008, 0.007, 0.006, 0.005 or other coefficient values, etc., the selection and preset of the specific preset function
  • the value of the coefficient can be determined in combination with the actual application environment, and it is not limited here.
  • J( ⁇ , x, y) represents the second loss function
  • represents the set of weights of each neural network layer in the first feature extraction network and the second feature extraction network
  • x represents the input to the first feature extraction network and The second feature extracts natural images in the network
  • y represents the third label category
  • sign represents the preset function is the sign function, here the preset coefficient value is 0.007, it should be understood that the example of the formula here is only to facilitate the understanding of the solution, and is not used to limit the solution , The preset function and preset coefficient can also be replaced.
  • the training device can also generate the function value of the second loss function according to the third classification category and the third label category, and generate the first gradient according to the function value of the second loss function, and the first gradient Multiply the preset coefficient to obtain the disturbance, and then superimpose the obtained disturbance with the original image to generate a confrontation image.
  • step 308 may include: the training device generates a second gradient according to the function value of the third loss function; perturbs the original image according to the second gradient to generate a confrontation image, and determines the third label category as the first One label category.
  • the third loss function is used to indicate the similarity between the fourth classification category and the third label category
  • the type of the third loss function is similar to the type of the second loss function, and no further examples are given here.
  • the difference between the second gradient and the first gradient is that the second gradient is obtained by performing gradient derivation on the function value of the third loss function, and the first gradient is obtained by performing gradient derivation on the function value of the second loss function.
  • J( ⁇ , x, y) in the above formula represents the second loss function
  • the above J( ⁇ , x, y) in the formula represents the third loss function.
  • the original image is disturbed according to the similarity between the fourth classification category output by the second robust representation and the third annotation category according to the classification network, so that the disturbance processing and the first feature extraction network are more closely related. It is pertinent and helps to improve the feature extraction capability of the first feature extraction network for robust representation.
  • the training device may generate the function value of the third loss function according to the fourth classification category and the third label category, and generate the second gradient according to the function value of the third loss function, and convert the second The gradient is brought into the preset function, and then multiplied by the preset coefficient to obtain the perturbation, and then the obtained perturbation is superimposed with the original image to generate a confrontation image.
  • the preset function and the preset coefficient please refer to the description in the above case A, which will not be repeated here.
  • the training device can also generate the function value of the third loss function according to the fourth classification category and the third label category, and generate the second gradient according to the function value of the second loss function, and the second gradient Multiply the preset coefficient to obtain the disturbance, and then superimpose the obtained disturbance with the original image to generate a confrontation image.
  • step 308 may include: the training device generates a third gradient according to the function value of the fourth loss function; perturbs the original image according to the third gradient to generate a confrontation image, and determines the third label category as the first One label category.
  • the fourth loss function is used to indicate the similarity between the fifth classification category and the third label category
  • the type of the fourth loss function is similar to the type of the second loss function, and no further examples are given here.
  • the difference between the third gradient and the first gradient is that the third gradient is obtained by performing gradient derivation on the function value of the fourth loss function, and the first gradient is obtained by performing gradient derivation on the function value of the second loss function.
  • J( ⁇ , x, y) in the above formula represents the second loss function
  • the above J( ⁇ , x, y) in the formula represents the fourth loss function.
  • the original image is disturbed according to the similarity between the fifth classification category output by the second non-robust representation and the third label category according to the classification network, so that the disturbance processing and the second feature extraction network are different from each other. It is more pertinent and helps to improve the feature extraction ability of the first feature extraction network for non-robust representations.
  • the training device can generate the function value of the fourth loss function according to the fifth classification category and the third label category, and generate the third gradient according to the function value of the fourth loss function, and the third The gradient is brought into the preset function, and then multiplied by the preset coefficient to obtain the perturbation, and then the obtained perturbation is superimposed with the original image to generate a confrontation image.
  • the preset function and the preset coefficient please refer to the description in the above case A, which will not be repeated here.
  • the training device may also generate the function value of the fourth loss function according to the fifth classification category and the third label category, and generate the third gradient according to the function value of the fourth loss function, and the third gradient Multiply the preset coefficient to obtain the disturbance, and then superimpose the obtained disturbance with the original image to generate a confrontation image.
  • steps 301 to 307 in the embodiment of this application are optional steps, but if the confrontation image is obtained based on the gradient of the aforementioned standard path, robust path or non-robust path, steps 301 to 307 are required. Steps, and the execution order of steps 301 to 307 is before step 308.
  • the training data set configured on the training device may be pre-configured with a confrontation image and a first label category corresponding to the confrontation image.
  • Step 308 may include: the training device obtains the confrontation image and the confrontation image from the training data set. The corresponding first label category.
  • the training device After the training device obtains the natural image, it generates a disturbance matrix in the form of a two-dimensional matrix according to the size of the two-dimensional matrix corresponding to the natural image.
  • the value satisfies the constraint in step 301.
  • the value of each parameter in the aforementioned disturbance matrix can be randomly generated, or the disturbance matrix can be generated in the order from small to large within the constraint range of step 301, or it can be generated in step 301.
  • the disturbance matrix is generated according to the order from large to small, or the disturbance matrix can also be generated according to other laws, etc., which is not limited here.
  • the training device inputs the confrontation image into the first feature extraction network to obtain a first robust representation generated by the first feature extraction network.
  • the training device after acquiring the confrontation image, the training device inputs the confrontation image into the first feature extraction network to obtain the first robust representation generated by the first feature extraction network.
  • the concept of robust representation has been introduced in step 302, and will not be repeated here.
  • the difference between the first robust representation and the second robust representation is that the second robust representation is the feature information extracted from the original image, and the first robust representation is the feature information extracted from the confrontation image.
  • the training device inputs the confrontation image into the second feature extraction network to obtain the first non-robust representation generated by the second feature extraction network.
  • the training device after acquiring the confrontation image, the training device inputs the confrontation image into the second feature extraction network to obtain the first non-robust representation generated by the second feature extraction network.
  • the concept of non-robust representation has been introduced in step 303, and will not be repeated here.
  • the difference between the first non-robust representation and the second non-robust representation is that the second non-robust representation is the feature information extracted from the original image, and the first non-robust representation is the feature extracted from the confrontation image. information.
  • step 309 can be performed first, and then step 310; step 310 can be performed first, and step 309 can be performed again; and steps can also be performed at the same time.
  • step 309 and 310 can be performed first, and then step 310; step 310 can be performed first, and step 309 can be performed again; and steps can also be performed at the same time.
  • the training device combines the first robust representation and the first non-robust representation to obtain a combined second representation.
  • the training device after obtaining the first robust representation and the first non-robust representation, the training device combines the first robust representation and the first non-robust representation to obtain the combined second representation.
  • the training device combines the first robust representation and the first non-robust representation to obtain the combined second representation.
  • the training device inputs the combined second representation into the classification network to obtain a sixth classification category output by the classification network.
  • the training device inputs the combined second representation into the classification network to obtain the sixth classification category output by the classification network.
  • the classification network used in step 312 and the classification network used in step 305 may be the same classification network or different classification networks.
  • the meaning of the sixth classification category is similar to that of the third classification category, except that the sixth classification category indicates the category of the object in the confrontation image.
  • the training device judges whether the sixth classification category is the same as the first label category, if they are not the same, go to step 314, and if they are the same, go to step 316.
  • the training device determines whether the sixth classification category is the same as the first label category, that is, whether the sixth classification category output by the classification network corresponds to the confrontation image The correct classification category. If they are not the same, go to step 314, and if they are the same, go to step 316.
  • the training device determines the sixth classification category as the second annotation category.
  • the second label category refers to the error category corresponding to the confrontation image, which is the misclassification of one or more objects in the confrontation image, which can include one or more classification categories, and is also used for the supervision data in the training phase.
  • the meaning of the second annotation category is similar to the meaning of the first annotation category, except that the second annotation category is a misclassification corresponding to the confrontation image, and the first annotation category is a correct classification corresponding to the confrontation image.
  • a method for obtaining the second label category is provided, which is simple to operate and does not require additional steps, which saves computer resources.
  • the training device inputs the first robust representation into the classification network to obtain the first classification category output by the classification network.
  • the training device inputs the first robust representation into the classification network to obtain the first classification category output by the classification network.
  • the meaning of the classification network can refer to the description in step 305 above, and step 314 can use the same classification network as in step 305, or a different classification network can be used.
  • the first classification category is the classification category of the object in the confrontation image.
  • the training device inputs the first non-robust representation into the classification network to obtain a second classification category output by the classification network.
  • the training device inputs the first non-robust representation into the classification network to obtain the second classification category output by the classification network.
  • the meaning of the classification network can refer to the description in step 305 above, and step 315 can use the same classification network as in step 305, or a different classification network can be used.
  • the second classification category is the classification category of the object in the confrontation image.
  • the training device performs iterative training on the first feature extraction network and the second feature extraction network according to the loss function until the convergence condition is met.
  • the training device may perform iterative training on the first feature extraction network and the second feature extraction network according to the loss function until the convergence condition is satisfied. Specifically, the training device generates a gradient value according to the function value of the loss function, and uses the gradient value to perform back propagation to update the neuron weights of the first feature extraction network and the second feature extraction network, so as to realize the extraction of the first feature A training of the network and the second feature extraction network.
  • the convergence condition may be that the convergence condition of the loss function is satisfied, or the number of iterations meets the preset number, etc.
  • the loss function is used to indicate the similarity between the classification category and the label category, and the similarity between the classification category and the label category can also be understood as the difference between the classification category and the label category. Since steps 301 to 307 and step 313 are optional steps, and steps 301 to 307 can be all executed or not executed, or partly executed and partly not executed; if step 313 is executed, the training device can pass step 315 Going to step 316 may also be going to step 316 through step 313. However, the specific meaning of the loss function in the foregoing multiple cases is not always complete. The various situations described above are separately described below.
  • Step 316 may include: the training device trains the first feature extraction network and the second feature extraction network according to the first loss function.
  • the first loss function is used to indicate the similarity between the first classification category and the first annotation category, and is used to indicate the similarity between the second classification category and the second annotation category.
  • the first loss function may specifically be expressed as a cross-entropy loss function, a maximum interval loss function, or other types of loss functions, which are not limited this time. In order to understand the first loss function more intuitively, an expression of the first loss function is shown as follows:
  • L AS ( ⁇ , x, y) represents the first loss function
  • l(h r (x adv ; ⁇ 1 ), y 1 ) represents the similarity between the first classification category and the first label category
  • x adv Represents the confrontation image
  • ⁇ 1 represents the weight in the first feature extraction network
  • y 1 represents the first label category
  • ⁇ 2 represents the weight in the second feature extraction network
  • It represents the second label category.
  • the training device After obtaining the first classification category and the second classification category through steps 315 and 316, the training device generates the function value of the first loss function, obtains the gradient value corresponding to the function value of the first loss function, and uses the The gradient value corresponding to the function value of the loss function is back-propagated to update the neuron weights of the first feature extraction network and the second feature extraction network, thereby completing a training of the first feature extraction network and the second feature extraction network .
  • step 316 may include: the training device according to the first The loss function and the second loss function train the first feature extraction network and the second feature extraction network.
  • the second loss function is used to indicate the similarity between the third classification category and the third annotation category
  • the third annotation category is the correct category corresponding to the original image.
  • the confrontation images are used to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also natural images are used to train the first feature extraction network and the second feature extraction network. To further improve the accuracy of the trained first feature extraction network and the trained second feature extraction network in the process of processing natural images.
  • the training device generates the first loss function after obtaining the third label category through step 301, after obtaining the third classification category through step 305, and after obtaining the first classification category and the second classification category through steps 315 and 316 And the function value of the second loss function.
  • the training device can generate a total function value according to the function value of the first loss function and the function value of the second loss function, and use the total function value to perform a training on the first feature extraction network and the second feature extraction network; specifically, training The device may directly add the function value of the first loss function and the function value of the second loss function to obtain the total function value, or the training device may be the function value of the first loss function and the function value of the second loss function respectively After allocating different weights, add them to get the total function value.
  • the specific steps of using function values to complete a training can refer to the above description, which will not be repeated here.
  • step 316 may include: the training device according to the first The loss function and the third loss function train the first feature extraction network and the second feature extraction network.
  • the third loss function is used to represent the similarity between the fourth classification category and the third label category, and the third label category is the correct category corresponding to the original image.
  • the confrontation images are used to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also natural images are used to train the robust representation extraction capabilities of the first feature extraction network to further improve training. The accuracy of the second feature extraction network.
  • the training device generates the first loss function after obtaining the third label category in step 301, after obtaining the fourth classification category in step 306, and after obtaining the first classification category and the second classification category in steps 315 and 316. And the function value of the third loss function.
  • the training device uses the function values of the first loss function and the third loss function to complete a training of the first feature extraction network and the second feature extraction network. For specific implementation methods, please refer to the function of using the first loss function and the second loss function. The description of training is not repeated here.
  • step 316 may include: training the device according to the first loss function And the fourth loss function to train the first feature extraction network and the second feature extraction network.
  • the fourth loss function is used to represent the similarity between the fifth classification category and the third label category, and the third label category is the correct category corresponding to the original image.
  • the confrontation images are used to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also natural images are used to train the second feature extraction network's extraction capabilities for non-robust representations to further improve The accuracy of the second feature extraction network after training.
  • the training device generates the first loss function after obtaining the fourth label category through step 301, after obtaining the fifth classification category through step 306, and after obtaining the first classification category and the second classification category through steps 315 and 316 And the function value of the fourth loss function.
  • the training device uses the function values of the first loss function and the fourth loss function to complete a training of the first feature extraction network and the second feature extraction network.
  • step 316 may include: training the device according to the first loss function and the second loss function And the third loss function to train the first feature extraction network and the second feature extraction network.
  • the training device After the training device generates the function value of the first loss function, the function value of the second loss function, and the function value of the third loss function, it can be based on the functions of the first loss function, the second loss function, and the third loss function. Value, generate a total function value, and then use the total function value to train the first feature extraction network and the second feature extraction network.
  • the training device generates the function value of the first loss function, the function value of the second loss function, and the function value of the third loss function, it can be based on the functions of the first loss function, the second loss function, and the third loss function. Value, generate a total function value, and then use the total function value to train the first feature extraction network and the second feature extraction network.
  • the description of training based on the function values of the first loss function and the second loss function, which will not be repeated here.
  • step 316 may include: training the device according to the first loss function and the first loss function.
  • the second loss function and the fourth loss function are used to train the first feature extraction network and the second feature extraction network.
  • step 316 may include: the training device according to the first The loss function, the third loss function, and the fourth loss function are iteratively trained on the first feature extraction network and the second feature extraction network until the convergence condition is met.
  • the loss function, the third loss function, and the fourth loss function are iteratively trained on the first feature extraction network and the second feature extraction network until the convergence condition is met.
  • step 316 may include: the training device performs an evaluation of the first feature according to the first loss function and the fifth loss function.
  • the extraction network and the second feature extraction network are trained.
  • the fifth loss function is used to represent the similarity between the fourth classification category and the third label category, and is used to represent the similarity between the fifth classification category and the third label category, and is used to represent the sixth category The similarity between the category and the third labeled category, which is the correct category corresponding to the original image.
  • the fifth loss function may specifically be expressed as a cross-entropy loss function, a maximum interval loss function, or other types of loss functions, which are not limited this time.
  • an expression of the fifth loss function is shown as follows:
  • L total ( ⁇ , x, y) L AS ( ⁇ , x, y) + L ST ( ⁇ , x, y);
  • L total ( ⁇ , x, y) represents the total loss function
  • L AS ( ⁇ , x, y) represents the first loss function
  • L ST ( ⁇ , x , Y) represents the fifth loss function
  • l(h s (x; ⁇ 3 ), y 2 ) represents the similarity between the fourth classification category and the third label category
  • x represents the original image
  • ⁇ 3 represents the first feature
  • y 2 represents the third annotation category
  • l(h r (x; ⁇ 1 ), y 2 ) represents the similarity between the fifth classification category and the third annotation category
  • l (h n (x; ⁇ 2 ), y 2 ) represents the similarity between the sixth classification category and the third label category.
  • the functions of other letters in the above formula please refer to the above description of the first loss function. It is understood that the example of the fifth loss function this time is only to facilitate
  • the processing capabilities of the first feature extraction network and the second feature extraction network on confrontation images are also improved, that is, Whether it is a natural image or an adversarial image, the trained first feature extraction network and the second feature extraction network can accurately extract robust and non-robust representations, which expands the application scenarios of this solution.
  • step 316 is entered through step 313, the training device no longer trains the first feature extraction network and the second feature extraction network according to the loss function, and It is to re-enter step 308 to obtain a new confrontation image and a new first label category, that is, to enter a new training process.
  • step 316 may include: training the device according to the second loss function, The first feature extraction network and the second feature extraction network are trained.
  • step 316 may include: the training device performs the calculation according to the third loss function The first feature extraction network is trained.
  • step 316 may include: the training device performs a calculation of the second loss function according to the fourth loss function.
  • the feature extraction network is trained.
  • step 316 may include: the training device performs a calculation of the first loss function and the third loss function according to the second loss function and the third loss function.
  • the feature extraction network and the second feature extraction network are trained.
  • step 316 may include: training the device according to the second loss function and the fourth loss function, Train the first feature extraction network and the second feature extraction network.
  • step 316 may include: training the device according to the third loss function and the first Four loss functions, training the first feature extraction network and the second feature extraction network.
  • step 316 may include: training the device to perform the first feature extraction network and the second feature extraction network according to the fifth loss function Perform iterative training.
  • step 313 if step 313 is not performed, the specific implementation of step 316 can refer to the description of various situations in which step 313 is performed and step 316 is entered through step 315, which is not repeated here.
  • the sixth classification category is the same as the first annotation category, it is proved that the disturbance of the image after the disturbance is too slight.
  • the processing method is the same as that of the natural
  • the purpose of training here is to enhance the ability of the first feature extraction network and the second feature extraction network to separate robust and non-robust representations from images with larger disturbances. Only in the sixth When the classification category is different from the first label category, subsequent training operations are performed to improve the efficiency of the training process.
  • the training device outputs the trained first feature extraction network and the trained second feature extraction network.
  • the training device after determining that the convergence condition is satisfied, the training device will output the trained first feature extraction network and the trained second feature extraction network, the trained first feature extraction network and the trained second feature
  • the extraction network can be used as the feature extraction part of various image processing networks, that is, the trained first feature extraction network and the trained second feature extraction network can be combined with the high-level feature processing network to realize various functions.
  • the aforementioned various functions may include one or more of the following: image classification, image recognition, image segmentation, or image detection.
  • the aforementioned function may also be to determine the image category, for example, to determine whether the image is a natural image or a confrontational image.
  • the technicians discovered during the research process that through adversarial training, the neural network only extracts the robust representation from the input image, while discarding the non-robust representation, resulting in a decrease in the accuracy of the neural network when processing the original image.
  • the trained first feature extraction network and the trained second feature extraction network can respectively extract the robust representation and the non-robust representation in the input image, which avoids the mixing of the two and leads to robustness.
  • the reduction in robustness can also retain both the robust representation and the non-robust representation in the input image, thereby avoiding the reduction in accuracy and improving the robustness and accuracy of the neural network at the same time.
  • the inference stage refers to the process in which the above-mentioned execution device 210 uses the trained image processing network to process the input image. Since the first feature extraction network after training and the second feature extraction network after training are obtained through the respective embodiments corresponding to FIG. 3, the first feature extraction network after training and the second feature extraction network after training are obtained. And, the combination of various high-level feature processing network layers can implement various functions. The specific implementation functions have been introduced in step 317. The two types of image processing networks in step 317 are introduced below.
  • the first introduction is the image processing network where the processing target is the object in the image, that is, the above-mentioned: image classification, image recognition, image segmentation, or image detection. Please refer to FIG. 6.
  • FIG. 6 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • the image processing method provided by an embodiment of the present application may include:
  • the execution device acquires a first image.
  • the execution device may collect the first image in real time, may also obtain the first image from a gallery stored by the execution device, or may also be the first image downloaded through a wireless or wired network.
  • the first image may be an original image or a confrontation image.
  • the execution device may be specifically represented as a mobile phone, a computer, a wearable device, an autonomous vehicle, a smart home appliance, or a chip, etc., different types of execution devices may obtain the first image in different ways. As an example, for example, if the execution device is a mobile phone, the execution device may collect and obtain the first image through a camera on the mobile phone, or may download and obtain the first image through a browser.
  • the execution device is an autonomous driving vehicle, and the autonomous driving vehicle can obtain the first image through sensor collection, etc.
  • the specific execution device acquiring the first image can be determined in combination with actual application scenarios and application products, which is not done here. Go into details.
  • the execution device inputs the first image into the first feature extraction network to obtain a third robust representation generated by the first feature extraction network.
  • the execution device after acquiring the first image, the execution device inputs the first image into the first feature extraction network, so that the first feature extraction network generates the first image corresponding to the first image according to the input first image.
  • Three robust representations for the specific expression form and the meaning of the robust representation of the first feature extraction network, please refer to the description in the embodiment corresponding to FIG. 3, which will not be repeated here.
  • the execution device inputs the first image into the second feature extraction network to obtain a third non-robust representation generated by the first feature extraction network.
  • the execution device after acquiring the first image, the execution device inputs the first image into the second feature extraction network, so that the second feature extraction network generates the first image corresponding to the first image according to the input first image.
  • the second feature extraction network After acquiring the first image, the execution device inputs the first image into the second feature extraction network, so that the second feature extraction network generates the first image corresponding to the first image according to the input first image.
  • Three non-robust representations for the specific expression form of the second feature extraction network and the meaning of the non-robust representation, please refer to the description in the embodiment corresponding to FIG. 3, which will not be repeated here.
  • the execution device combines the third robust representation and the third non-robust representation to obtain a combined fourth representation.
  • the execution device combines the third robust representation and the third non-robust representation to obtain the combined fourth representation.
  • step 604 please refer to the description of step 304 in the corresponding embodiment in FIG. 3.
  • the first case refers to the case where the accuracy of the output result of the image processing network is high, and the specific circumstances can be combined.
  • the actual application scenario is determined, and there is no limitation here.
  • the execution device outputs a first processing result corresponding to the first image according to the combined fourth representation through the feature processing network.
  • the execution device inputs the combined fourth representation into the feature processing network, so that the feature processing network outputs corresponding to the first image according to the combined fourth representation
  • the first processing result is related to the function of the entire image processing network.
  • the function of the image processing network is image classification
  • the feature processing network may be a classification network
  • the first processing result is used to indicate the classification category of the entire image;
  • the classification network may specifically be represented as a network that includes at least one perceptron.
  • the aforementioned perceptron can be a double-layer fully connected perceptron.
  • the function of the image processing network is image recognition
  • the feature processing network may be a recognition network
  • the first processing result is used to indicate the content recognized from the image, such as text content in the image.
  • the function of the image processing network is image segmentation
  • the feature processing network may include a classification network, which is used to generate the classification category of each pixel in the image, and then use the classification of each pixel in the image The category divides the image, and the first processing result is the divided image.
  • the function of the image processing network is image detection
  • the first processing result may be specifically expressed as a detection result
  • the detection result indicates which objects are included in the first image, that is, it may indicate at least one object included in the first image
  • the detection result can also include the position information of each object in the aforementioned at least one object, etc., which can be specifically determined in combination with actual product requirements, and will not be exhaustive here.
  • a variety of specific implementation manners of the image processing network are provided, which expands the application scenarios of this solution and improves the implementation flexibility of this solution.
  • the classification network on the execution device may perform a classification operation according to the combined fourth representation, and output a classification category corresponding to the first image.
  • the recognition network on the execution device can perform a recognition operation according to the combined fourth representation, and output a recognition result corresponding to the first image. All application scenarios are not exhaustively listed here.
  • the execution device outputs the first processing result corresponding to the first image according to the third robust representation through the feature processing network.
  • the execution device may also input the third robust representation into the feature extraction network, so that the feature extraction network outputs the first image corresponding to the first image according to the third robust representation.
  • One processing result For a specific implementation manner, refer to the description in step 306 in the embodiment corresponding to FIG. 3.
  • the first situation and the second situation are different situations.
  • the second situation refers to a situation where the robustness of the output result of the image processing network is required to be high, or the second situation refers to the situation where the image processing network is at a high level.
  • the state of risk that is, when the probability that the input image is a disturbed image is high, etc., the specific situation can be determined in combination with the actual application scenario, which is not limited here.
  • the classification network on the execution device may perform the classification operation according to the third robust representation, and output the classification category corresponding to the first image.
  • the recognition network on the execution device can perform the recognition operation according to the third robust representation, and output the recognition result corresponding to the first image, etc. All application scenarios are not exhaustively listed here.
  • the image processing network includes both a robust path and a standard path, and the user can flexibly choose which path to use according to the actual situation, which expands the application scenarios of the solution and improves the flexibility of the solution.
  • the execution device outputs the first processing result corresponding to the first image according to the third non-robust representation through the feature processing network.
  • the execution device may also input the third non-robust representation into the feature extraction network, so that the feature extraction network outputs the same as the first image according to the third non-robust representation.
  • the third situation is different from the first situation and the second situation.
  • the classification network on the execution device may perform the classification operation according to the third non-robust representation, and output the classification category corresponding to the first image.
  • the recognition network on the execution device may perform the recognition operation according to the third non-robust representation, and output the recognition result corresponding to the first image. All application scenarios are not exhaustively listed here. In the embodiment of the present application, the provided image processing method falls into the specific application scenario of image classification, which improves the degree of integration with the application scenario.
  • step 607 is an optional step. If step 607 is not performed, the execution can end after step 605 is performed or after step 606 is performed.
  • step 605, step 606, and step 607 shown in the above embodiment are in a parallel relationship.
  • step 605, step 606, and step 607 can also be performed at the same time, as an example, such as steps 605 and 606. Can be executed, or steps 605 and 607 can be executed, or steps 606 and 607 can be executed, or steps 605 to 607 are executed, etc., which specific steps are executed can be determined in conjunction with specific application scenarios, here Not limited.
  • FIG. 7 is a schematic diagram of the image processing network in the image processing method provided by the embodiment of the application.
  • an image processing network is taken as an image classification network as an example.
  • the image processing network includes a first feature extraction network, a second feature extraction network, and a classification network.
  • the first image is input into the first feature extraction network and the second feature extraction network, respectively, to obtain a robust representation generated by the first feature extraction network and a non-robust representation generated by the second feature extraction network.
  • the classification network in Fig. 7 includes three paths, namely robust paths, standard paths, and non-robust paths.
  • the robust path refers to the classification network based on the robust representation
  • the standard path refers to the combination of the robust representation and the non-robust representation, and the classification network performs classification operations based on the combined representation
  • the non-robust path refers to The classification network performs classification operations based on non-robust representations.
  • Figure 7 shows that the robust path, standard path, and non-robust path use the same classification network. It should be understood that the example in Figure 7 is only to facilitate understanding of the solution. In other implementations, the robust path The standard path and the non-robust path can use three different classification networks.
  • the second introduction is the image processing network whose processing target is an image, that is, the image processing network is used to determine whether the input is a natural image or an adversarial image.
  • FIG. 8 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • the image processing method provided by an embodiment of the present application may include:
  • the execution device acquires a first image.
  • the execution device inputs the first image into the first feature extraction network to obtain a third robust representation generated by the first feature extraction network.
  • the execution device inputs the first image into the second feature extraction network to obtain a third non-robust representation generated by the first feature extraction network.
  • steps 801 to 803 for the specific implementation of steps 801 to 803, please refer to the description of steps 601 to 603 in the embodiment corresponding to FIG. 6, which will not be repeated here.
  • the execution device outputs a first processing result corresponding to the first image according to the third robust representation and the third non-robust representation through the feature processing network, the first processing result indicates that the first image is the original image, or the first A processing result indicates that the first image is a disturbed image.
  • the execution device after the execution device obtains the third robust representation and the third non-robust representation, it can output the first processing according to the third robust representation and the third non-robust representation through the feature processing network result.
  • the feature processing network may include at least one perceptron.
  • the perceptron refer to the description of step 305 in the embodiment corresponding to FIG. 3.
  • step 804 may include: after obtaining the third robust representation and the third non-robust representation, the execution device inputs the third robust representation and the third non-robust representation to the feature processing In the network, through the feature processing network, the seventh classification category corresponding to the first image is determined according to the robust representation, and the eighth classification category corresponding to the first image is determined according to the non-robust representation.
  • the feature processing network may include a classification network, and the execution device uses a classification network in the feature processing network to sequentially perform two classification operations to obtain the seventh classification category and the eighth classification category, respectively.
  • the feature processing network may include two classification networks, and the execution device uses the two classification networks in the feature processing network to perform two classification operations in parallel to obtain the seventh classification category and the eighth classification category, respectively.
  • the execution device judges whether the seventh classification category is consistent with the eighth classification category through the feature processing network.
  • the first processing result output by the feature processing network indicates that the first image is the original Image;
  • the first processing result output by the feature processing network indicates that the first image is a disturbed image.
  • the first processing result may be specifically expressed in a text form, as an example, for example, the first processing result may be specifically expressed as a "natural image” or a "confrontational image".
  • the first processing result can also be expressed in the form of characters. As an example, for example, the first processing result is specifically expressed as "0 0.3 1 0.7".
  • 0 can refer to a natural image
  • 1 can refer to a confrontation image, that is, a natural image has a probability of 0.3 and a probability of 0.7 is The confrontation image, so that the first processing result indicates that the first image is a confrontation image.
  • the first processing result is specifically expressed as "0.3 0.7"
  • 0.3 indicates the probability that the first image is a natural image
  • 0.7 indicates the probability that the first image is an adversarial image, etc., so that the first processing result indicates the first image To counter the image.
  • the example of the first processing result here is only to facilitate understanding of the solution, and is not used to limit the solution.
  • the method by judging whether the seventh classification category and the eighth classification category are consistent, to determine whether the first image is the original image or the adversarial image, the method is simple and the operability is strong.
  • step 804 may include: combining the third robust representation and the third non-robust representation, and performing a detection operation according to the combined fifth representation to output a detection result corresponding to the first image,
  • the detection result is a first processing result.
  • the detection network may include at least one perceptron.
  • the meaning of the perceptron refer to the description of step 305 in the embodiment corresponding to FIG. 3.
  • the specific manifestation of the detection result please refer to the description of the first processing result in the previous case, which will not be repeated here.
  • the feature information extracted by the first feature extraction network and the second extraction network be used to obtain the processing result corresponding to the object in the image, but also the processing result corresponding to the entire image can be obtained, that is, used for Judging whether the image is the original image or the disturbed image expands the application scenarios of this solution.
  • the first feature extraction network and the second feature extraction network extract the robust representation and non-robust representation in the input image, which not only avoids the mixing of the two and leads to the reduction of robustness, but also retains the robust representation and the non-robust representation in the input image.
  • Non-robust representation thereby avoiding the reduction of accuracy rate, and improving the robustness and accuracy of the neural network at the same time.
  • Table 1 takes the first feature extraction network and the second feature extraction network as the feature extraction part of WRNS34 as an example, where S refers to the standard data set, which can include natural images and confrontation images; R refers to confrontation Data set, which only includes confrontation images; N refers to the natural data set, which only includes natural images.
  • Adversarial training (AT) and iterative optimization are the two current training methods. From the data shown in Table 1, it can be seen that when processing the images in the three data sets of S, R, and N, The training methods provided in the application embodiments all have the highest accuracy rates, that is, the embodiments of the present application provide a training scheme that can improve robustness and accuracy at the same time.
  • the first feature extraction network and the second feature extraction network are both feature extraction parts in WRNS34 as an example. Detection accuracy refers to the proportion of images whose prediction results match the actual situation in the total input images. Obviously, the image processing network is obtained through the training method provided in the embodiment of the application.
  • FIG. 9 is a schematic structural diagram of a neural network training device provided by an embodiment of the application.
  • the training device 900 of the neural network may include an input module 901 and a training module 902.
  • the input module 901 is used to input the confrontation image into the first feature extraction network and the second feature extraction network to obtain the first robust representation generated by the first feature extraction network and the first non-robust representation generated by the second feature extraction network.
  • the input module 901 is further configured to input the first robust representation into the classification network to obtain the first classification category output by the classification network, and input the first non-robust representation into the classification network to obtain the second classification category output by the classification network.
  • the training module 902 is configured to iteratively train the first feature extraction network and the second feature extraction network according to the first loss function until the convergence condition is met, and output the trained first feature extraction network and the trained second feature extraction The internet.
  • the first loss function is used to represent the similarity between the first category and the first label category, and is used to represent the similarity between the second classification category and the second label category, and the first label category is the same as the adversarial image Corresponding to the correct category, the second label category is the wrong category corresponding to the confrontation image.
  • the technicians discovered during the research process that through adversarial training, the neural network only extracts the robust representation from the input image, while discarding the non-robust representation, resulting in a decrease in the accuracy of the neural network when processing the original image.
  • the robust representation and the non-robust representation in the input image are extracted through the first feature extraction network and the second feature extraction network respectively, which not only avoids the mixing of the two and leads to a decrease in robustness, but also The robust representation and the non-robust representation in the input image can be retained at the same time, thereby avoiding the decrease in accuracy rate, and improving the robustness and accuracy of the neural network at the same time.
  • FIG. 10 is a schematic structural diagram of a neural network training device provided by an embodiment of this application.
  • the input module 901 is also used to input the original image into the first feature extraction network. And a second feature extraction network to obtain a second robust representation generated by the first feature extraction network and a second non-robust representation generated by the second feature extraction network.
  • the device 900 further includes: a combining module 903, configured to combine the second robust representation and the second non-robust representation to obtain the combined first representation.
  • the input module 901 is further configured to input the combined first representation into the classification network to perform a classification operation according to the combined first representation through the classification network to obtain a third classification category output by the classification network.
  • the training module 902 is specifically configured to perform iterative training on the first feature extraction network and the second feature extraction network according to the first loss function and the second loss function until the convergence condition is met, where the second loss function is used to represent the third The similarity between the classification category and the third annotation category, which is the correct category corresponding to the original image.
  • the training module 902 not only uses confrontation images to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also uses natural images to train the first feature extraction network and the second feature extraction network.
  • the feature extraction capabilities of the feature extraction network are used to further improve the accuracy of the trained first feature extraction network and the trained second feature extraction network in the process of processing natural images.
  • the input module 901 is also used to input the original image into the first feature extraction network to obtain the second robust representation generated by the first feature extraction network.
  • the input module 901 is further configured to input the second robust representation into the classification network, so as to perform a classification operation according to the second robust representation through the classification network to obtain the fourth classification category output by the classification network.
  • the training module 902 is specifically configured to perform iterative training on the first feature extraction network and the second feature extraction network according to the first loss function and the third loss function until the convergence condition is met, where the third loss function is used to represent the fourth The similarity between the classification category and the third annotation category, which is the correct category corresponding to the original image.
  • the training module 902 not only uses confrontation images to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also uses natural images to train the robust representation extraction capabilities of the first feature extraction network to Further improve the accuracy of the first feature extraction network after training.
  • the input module 901 is also used to input the original image into the second feature extraction network to obtain a second non-robust representation generated by the second feature extraction network.
  • the input module 901 is further configured to input the second non-robust representation into the classification network to perform a classification operation according to the second non-robust representation through the classification network to obtain the fifth classification category output by the classification network.
  • the training module 902 is specifically configured to perform iterative training on the first feature extraction network and the second feature extraction network according to the first loss function and the fourth loss function until the convergence condition is met, where the fourth loss function is used to represent the fifth The similarity between the classification category and the third annotation category, which is the correct category corresponding to the original image.
  • the training module 902 not only uses confrontation images to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also uses natural images to train the second feature extraction network's extraction capabilities for non-robust representations. In order to further improve the accuracy of the second feature extraction network after training.
  • the input module 901 is also used to input the original image into the first feature extraction network and the second feature extraction network, respectively, to obtain the second robust representation and the second feature extraction generated by the first feature extraction network The second non-robust representation generated by the network.
  • the input module 901 is also used to combine the second robust representation and the second robust representation to obtain a combined first representation, and input the combined first representation into the classification network to pass the classification network according to the combined first representation Indicates that the classification operation is performed to obtain the third classification category output by the classification network.
  • the input module 901 is further configured to input the second robust representation into the classification network, so as to perform a classification operation according to the second robust representation through the classification network to obtain the fourth classification category output by the classification network.
  • the input module 901 is further configured to input the second non-robust representation into the classification network to perform a classification operation according to the second non-robust representation through the classification network to obtain the fifth classification category output by the classification network.
  • the training module 902 is specifically configured to perform iterative training on the first feature extraction network and the second feature extraction network according to the first loss function and the fifth loss function until the convergence condition is met, where the fifth loss function is used to represent the fourth The similarity between the classification category and the third annotation category, and is used to indicate the similarity between the fifth classification category and the third annotation category, and the similarity between the sixth classification category and the third annotation category ,
  • the third label category is the correct category corresponding to the original image.
  • the processing capabilities of the first feature extraction network and the second feature extraction network on confrontation images are also improved, that is, Whether it is a natural image or an adversarial image, the trained first feature extraction network and the second feature extraction network can accurately extract robust and non-robust representations, which expands the application scenarios of this solution.
  • the device further includes a generating module 904, which is specifically configured to: generate a first gradient according to the function value of the second loss function; and perform perturbation processing on the original image according to the first gradient to A confrontation image is generated, and the third label category is determined as the first label category.
  • a generating module 904 which is specifically configured to: generate a first gradient according to the function value of the second loss function; and perform perturbation processing on the original image according to the first gradient to A confrontation image is generated, and the third label category is determined as the first label category.
  • the generation module 904 generates the first gradient according to the similarity between the third classification category and the third annotation category, and perturbs the original image according to the first gradient, so that the disturbance processing is more targeted. It is helpful to speed up the training process of the first feature extraction network and the second feature extraction network, and improve the efficiency of the training process.
  • the device further includes a generating module 904, specifically configured to: generate a second gradient according to the function value of the third loss function; perform perturbation processing on the original image according to the second gradient to A confrontation image is generated, and the third label category is determined as the first label category.
  • a generating module 904 specifically configured to: generate a second gradient according to the function value of the third loss function; perform perturbation processing on the original image according to the second gradient to A confrontation image is generated, and the third label category is determined as the first label category.
  • the generation module 904 perturbs the original image according to the similarity between the fourth classification category output by the second robust representation and the third annotation category according to the classification network, so that the perturbation processing is the same as the first feature extraction network.
  • the relationship is more pertinent, which is beneficial to improve the feature extraction ability of the first feature extraction network for robust representation.
  • the device further includes a generating module 904, specifically configured to: generate a third gradient according to the function value of the fourth loss function; and perform perturbation processing on the original image according to the third gradient to A confrontation image is generated, and the third label category is determined as the first label category.
  • a generating module 904 specifically configured to: generate a third gradient according to the function value of the fourth loss function; and perform perturbation processing on the original image according to the third gradient to A confrontation image is generated, and the third label category is determined as the first label category.
  • the generation module 904 perturbs the original image according to the similarity between the fifth classification category output by the classification network according to the second non-robust representation and the third annotation category, so that the perturbation processing and the second feature extraction
  • the networks are more pertinent, which helps to improve the feature extraction ability of the first feature extraction network for non-robust representations.
  • the device 900 further includes: a combining module 903, configured to combine the first robust representation and the first non-robust representation to obtain a combined second representation.
  • the input module 901 is also used to input the combined second representation into the classification network to obtain the sixth classification category output by the classification network.
  • the input module 901 is specifically configured to input the first robust representation into the classification network to obtain the first classification category output by the classification network, and input the first non-robust representation into the classification network when the sixth classification category is different from the first label category.
  • the classification network obtains the second classification category output by the classification network.
  • the sixth classification category is the same as the first annotation category, it is proved that the disturbance of the image after the disturbance is too slight.
  • the processing method is the same as that of the natural
  • the purpose of training here is to enhance the ability of the first feature extraction network and the second feature extraction network to separate robust and non-robust representations from images with larger disturbances. Only in the sixth When the classification category is different from the first label category, subsequent training operations are performed to improve the efficiency of the training process.
  • the device 900 further includes: a determining module 905, configured to determine the sixth classification category as the second label category when the sixth classification category is different from the first label category .
  • a method for obtaining the second label category is provided, which is simple to operate and does not require additional steps, which saves computer resources.
  • the first feature extraction network is a convolutional neural network or a residual neural network
  • the second feature extraction network is a convolutional neural network or a residual neural network.
  • FIG. 11 is a schematic structural diagram of a neural network training device provided by an embodiment of the application.
  • the image processing network 1100 includes a first feature extraction network 1101, a second feature extraction network 1102, and a feature processing network 1103.
  • the first feature extraction network 1101 is configured to receive a first image input and generate a lug corresponding to the first image.
  • Robust representation robust representation refers to features that are not sensitive to disturbances.
  • the second feature extraction network 1102 is configured to receive the input first image, and generate a non-robust representation corresponding to the first image.
  • the non-robust representation refers to a feature that is sensitive to disturbances.
  • the feature processing network 1103 is used to obtain a robust representation and a non-robust representation to output the first processing result corresponding to the first image.
  • the first feature extraction network 1102 and the second feature extraction network 1103 are respectively used to extract the robust representation and the non-robust representation in the input image, which not only avoids the mixing of the two and reduces the robustness, but also The robust representation and the non-robust representation in the input image can be retained at the same time, thereby avoiding the decrease in accuracy rate, and improving the robustness and accuracy of the neural network at the same time.
  • the feature processing network 1103 is specifically used to: combine the robust representation and the non-robust representation, and according to the combined representation, output the first processing result corresponding to the first image; or, according to the robust
  • the stick means that the first processing result corresponding to the first image is output, and the first case and the second case are different cases; or, according to the non-robust representation, the first processing result corresponding to the first image is output.
  • the image processing network 1100 includes both a robust path and a standard path, and the user can flexibly choose which path to use according to the actual situation, which expands the application scenarios of this solution and improves the implementation flexibility of this solution.
  • the feature processing network 1103 is specifically used to: perform a classification operation according to the combined representation and output a classification category corresponding to the first image; or, perform a classification operation according to a robust representation, and output a classification operation corresponding to the first image.
  • the classification category corresponding to the image; or, the classification operation is performed according to the non-robust representation, and the classification category corresponding to the first image is output.
  • the provided image processing network 1100 falls into the specific application scenario of image classification, which improves the degree of integration with the application scenario.
  • the first processing result indicates that the first image is an original image, or the first processing result indicates that the first image is a disturbed image.
  • the feature information extracted by the first feature extraction network 1101 and the second extraction network 1102 can be used to obtain the processing result corresponding to the object in the image, but also the processing result corresponding to the entire image can be obtained, that is, It is used to determine whether the image is the original image or the disturbed image, which expands the application scenarios of this solution.
  • the feature processing network 1103 is specifically used to: determine the first classification category corresponding to the first image according to the robust representation; determine the second classification corresponding to the first image according to the non-robust representation Category; in the case that the first classification category is consistent with the second classification category, the output first processing result indicates that the first image is the original image; in the case where the first classification category is inconsistent with the second classification category, the output first The processing result indicates that the first image is a disturbed image.
  • the feature processing network 1103 determines whether the first image is the original image or the adversarial image by judging whether the seventh classification category and the eighth classification category are consistent.
  • the method is simple and has strong operability.
  • the feature processing network 1103 is specifically used to combine the robust representation and the non-robust representation, and perform detection operations according to the combined representation to output the detection result corresponding to the first image.
  • the processing result includes the test result.
  • another implementation manner of determining whether the first image is the original image or the confrontation image is provided, which enhances the implementation flexibility of the solution.
  • the image processing network 1100 is one or more of the following: an image classification network, an image recognition network, an image segmentation network, or an image detection network.
  • an image classification network an image classification network
  • an image recognition network an image segmentation network
  • an image detection network an image detection network
  • the feature processing network 1103 includes a perceptron.
  • the first feature extraction network 1101 is a convolutional neural network or a residual neural network
  • the second feature extraction network 1102 is a convolutional neural network or a residual neural network.
  • FIG. 12 is a schematic structural diagram of the execution device provided by the embodiment of the application. Driving vehicles, smart home appliances, chips, or other states, etc., are not limited here.
  • the image processing network 1100 described in the embodiment corresponding to FIG. 11 may be deployed on the execution device 1200 to implement the functions of the execution device in the embodiment corresponding to FIG. 6 to FIG. 8.
  • the execution device 1200 includes: a receiver 1201, a transmitter 1202, a processor 1203, and a memory 1204 (the number of processors 1203 in the data generating apparatus 1200 may be one or more, and one processor is taken as an example in FIG. 12), where
  • the processor 1203 may include an application processor 12031 and a communication processor 12032.
  • the receiver 1201, the transmitter 1202, the processor 1203, and the memory 1204 may be connected by a bus or other methods.
  • the memory 1204 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1203. A part of the memory 1204 may also include a non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1204 stores a processor and operating instructions, executable modules or data structures, or a subset of them, or an extended set of them.
  • the operating instructions may include various operating instructions for implementing various operations.
  • the processor 1203 controls the operation of the data generating device.
  • the various components of the data generating device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus.
  • bus system may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus.
  • various buses are referred to as bus systems in the figure.
  • the methods disclosed in the above embodiments of the present application may be applied to the processor 1203 or implemented by the processor 1203.
  • the processor 1203 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 1203 or instructions in the form of software.
  • the above-mentioned processor 1203 may be a general-purpose processor, a digital signal processing (digital signal processing, DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • FPGA field programmable Field-programmable gate array
  • the processor 1203 can implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1204, and the processor 1203 reads the information in the memory 1204, and completes the steps of the foregoing method in combination with its hardware.
  • the receiver 1201 can be used to receive input digital or character information, and to generate signal input related to the related settings and function control of the data generating device.
  • the transmitter 1202 can be used to output digital or character information through the first interface.
  • the transmitter 1202 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group.
  • the transmitter 1202 can also include display devices such as a display screen. .
  • the processor 1203 is configured to execute the image processing method executed by the execution device in the embodiment corresponding to FIG. 6 to FIG. 8.
  • the application processor 12031 is configured to perform the following steps: input the first image into the first feature extraction network to obtain a robust representation corresponding to the first image generated by the first feature extraction network, and the robust representation refers to the disturbance Insensitive features; the first image is input to the second feature extraction network, and the non-robust representation corresponding to the first image generated by the second feature extraction network is obtained.
  • the non-robust representation refers to the features that are sensitive to disturbance; through the feature
  • the processing network outputs a first processing result corresponding to the first image according to the robust representation and the non-robust representation, and the first feature extraction network, the second feature extraction network, and the feature processing network belong to the same image processing network.
  • the application processor 12031 is also used to execute other steps executed by the execution device in the method embodiments corresponding to FIG. 6 to FIG. You can refer to the descriptions in the respective method embodiments corresponding to FIG. 2 to FIG. 8, which will not be repeated here.
  • the embodiment of the present application also provides a training device. Please refer to FIG. 13, which is a schematic structural diagram of the training device provided by the embodiment of the present application.
  • the training device 900 described in the embodiment corresponding to FIG. 9 and FIG. 10 may be deployed on the training device 1300 to realize the function of the training device in the embodiment corresponding to FIG. 3 and FIG. 5.
  • the training device 1300 consists of one or more Implementation of a single server
  • the training device 1300 may have relatively large differences due to different configurations or performance, and may include one or more central processing units (CPU) 1322 (for example, one or more processors) and a memory 1332 ,
  • One or more storage media 1330 for storing application programs 1342 or data 1344 for example, one or one storage device with a large amount of storage.
  • the memory 1332 and the storage medium 1330 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1330 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the training device.
  • the central processing unit 1322 may be configured to communicate with the storage medium 1330, and execute a series of instruction operations in the storage medium 1330 on the training device 1300.
  • the training device 1300 may also include one or more power supplies 1326, one or more wired or wireless network interfaces 1350, one or more input and output interfaces 1358, and/or one or more operating systems 1341, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • operating systems 1341 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the central processing unit 1322 is configured to execute the image processing method executed by the training device in the embodiment corresponding to FIG. 3. Specifically, the central processing unit 1322 is configured to input the confrontation image into the first feature extraction network and the second feature extraction network to obtain the first robust representation generated by the first feature extraction network and the first robust representation generated by the second feature extraction network.
  • Non-robust representation where the counter image is an image that has undergone disturbance processing. Robust representation refers to features that are not sensitive to disturbance, and non-robust representation refers to features that are sensitive to disturbance.
  • the first robust is represented by the input classification network to obtain the first classification category output by the classification network
  • the first non-robust is represented by the input classification network to obtain the second classification category output by the classification network
  • the first loss function is used to represent the similarity between the first category and the first label category
  • the second label category is the same as the adversarial image
  • the second label category is the wrong category corresponding to the confrontation image.
  • central processing unit 1322 is also used to execute other steps executed by the execution device in the embodiment corresponding to FIG. Refer to the descriptions in the respective method embodiments corresponding to FIG. 3, which will not be repeated here.
  • the embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a program. When it runs on a computer, the computer executes the steps described in the embodiments shown in FIGS. 3 to 5 above. The steps performed by the training device in the method.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a program, and when it runs on a computer, the computer executes the steps described in the foregoing embodiments shown in FIGS. 6 to 8 The steps performed by the device in the method.
  • the embodiment of the present application also provides a product including a computer program, which when it is driven on a computer, causes the computer to execute the steps performed by the training device in the method described in the embodiments shown in FIGS. 3 to 5, or, The computer executes the steps executed by the execution device in the method described in the foregoing embodiments shown in FIG. 6 to FIG. 8.
  • An embodiment of the present application also provides a circuit system, the circuit system includes a processing circuit configured to perform the steps performed by the training device in the method described in the embodiments shown in FIGS. 3 to 5, or The processing circuit is configured to execute the steps performed by the execution device in the method described in the embodiments shown in FIG. 6 to FIG. 8.
  • the training device or execution device of the neural network provided in the embodiment of the present application may specifically be a chip.
  • the chip includes a processing unit and a communication unit.
  • the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, Pins or circuits, etc.
  • the processing unit can execute the computer execution instructions stored in the storage unit, so that the chip in the training device executes the neural network training method described in the embodiments shown in FIGS. 3 to 5, or so that the chip in the execution device executes the above The image processing method described in the embodiments shown in FIGS. 6 to 8.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • FIG. 14 is a schematic diagram of a structure of a chip provided by an embodiment of the application.
  • the chip may be expressed as a neural network processor NPU 140, which is mounted as a coprocessor to the main CPU (Host On the CPU), the Host CPU assigns tasks.
  • the core part of the NPU is the arithmetic circuit 1403, and the controller 1404 controls the arithmetic circuit 1403 to extract matrix data from the memory and perform multiplication operations.
  • the arithmetic circuit 1403 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 1403 is a two-dimensional systolic array. The arithmetic circuit 1403 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1403 is a general-purpose matrix processor.
  • PE Processing Unit
  • the arithmetic circuit 1403 is a two-dimensional systolic array. The arithmetic circuit 1403 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1403 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the corresponding data of matrix B from the weight memory 1402 and caches it on each PE in the arithmetic circuit.
  • the arithmetic circuit fetches the matrix A data and matrix B from the input memory 1401 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 1408.
  • the unified memory 1406 is used to store input data and output data.
  • the weight data directly passes through the memory unit access controller (Direct Memory Access Controller, DMAC) 1405, and the DMAC is transferred to the weight memory 1402.
  • the input data is also transferred to the unified memory 1406 through the DMAC.
  • DMAC Direct Memory Access Controller
  • the BIU is the Bus Interface Unit, that is, the bus interface unit 1410, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (IFB) 1409.
  • IFB instruction fetch buffer
  • the bus interface unit 1410 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1409 to obtain instructions from the external memory, and is also used for the storage unit access controller 1405 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • BIU Bus Interface Unit
  • the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1406 or to transfer the weight data to the weight memory 1402 or to transfer the input data to the input memory 1401.
  • the vector calculation unit 1407 includes multiple arithmetic processing units, and if necessary, further processes the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. It is mainly used in the calculation of non-convolutional/fully connected layer networks in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.
  • the vector calculation unit 1407 can store the processed output vector to the unified memory 1406.
  • the vector calculation unit 1407 may apply a linear function and/or a non-linear function to the output of the arithmetic circuit 1403, such as linearly interpolating the feature plane extracted by the convolutional layer, and for example a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 1407 generates normalized values, pixel-level summed values, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 1403, for example for use in a subsequent layer in a neural network.
  • the instruction fetch buffer 1409 connected to the controller 1404 is used to store instructions used by the controller 1404;
  • the unified memory 1406, the input memory 1401, the weight memory 1402, and the fetch memory 1409 are all On-Chip memories.
  • the external memory is private to the NPU hardware architecture.
  • the calculation of each layer in the recurrent neural network can be executed by the arithmetic circuit 1403 or the vector calculation unit 1407.
  • processor mentioned in any of the foregoing may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the method in the first aspect.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate.
  • the physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the connection relationship between the modules indicates that they have a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • this application can be implemented by means of software plus necessary general hardware.
  • it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CLUs, dedicated memories, Dedicated components and so on to achieve.
  • all functions completed by computer programs can be easily implemented with corresponding hardware.
  • the specific hardware structures used to achieve the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. Circuit etc.
  • software program implementation is a better implementation in more cases.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, server, or network device, etc.) execute the methods described in each embodiment of this application .
  • a computer device which can be a personal computer, server, or network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Abstract

A training method for a neural network for image processing and a related device, relating to an image processing technology in the field of artificial intelligence. The method comprises: separately inputting an adversarial image into a first feature extraction network and a second feature extraction network to obtain robust representation and non-robust representation; separately inputting the robust representation and the non-robust representation into a classification network to obtain a first classification category and a second classification category output by the classification network; and performing iterative training according to a first loss function until a convergence condition is met. The first loss function is used for representing the similarity between the first category and a correct category corresponding to the adversarial image, and the similarity between the second classification category and an error category corresponding to the adversarial image. The robustness reduction caused by mixing of the robust representation and the non-robust representation is avoided, the robust representation and the non-robust representation can also be reserved at the same time, and thus the accuracy reduction is avoided, and robustness and accuracy of the neural network are improved at the same time.

Description

一种用于图像处理的神经网络以及相关设备A neural network and related equipment for image processing
本申请要求于2020年4月30日提交中国专利局、申请号为202010362629.6、发明名称为“一种用于图像处理的神经网络以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 202010362629.6, and the invention title is "a neural network and related equipment for image processing" on April 30, 2020, the entire content of which is by reference Incorporated in this application.
技术领域Technical field
本申请涉及人工智能领域,尤其涉及一种用于图像处理的神经网络以及相关设备。This application relates to the field of artificial intelligence, and in particular to a neural network and related equipment for image processing.
背景技术Background technique
人工智能(Artificial Intelligence,AI)是利用计算机或者计算机控制的机器模拟、延伸和扩展人的智能。人工智能包括研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。目前,基于深度学习的神经网络进行图像处理是人工智能常见的一种应用方式。Artificial Intelligence (AI) is the use of computers or computer-controlled machines to simulate, extend and expand human intelligence. Artificial intelligence includes studying the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. At present, image processing based on deep learning neural networks is a common application of artificial intelligence.
尽管如今的神经网络已经具备极高的识别准确率,但研究者发现,在对输入的原始图像进行极其微小的扰动之后,就可以迷惑识别准确度很高的神经网络,使其识别准确率急剧下降,这种被扰动的图像,我们称之为对抗(adversarial)图像。Although today's neural networks have extremely high recognition accuracy, researchers have found that after extremely small disturbances to the input original image, they can confuse the neural network with high recognition accuracy and make its recognition accuracy sharp. Falling, this disturbed image, we call an adversarial image.
为了提高神经网络的鲁棒性,人们提出了对抗训练,也即将对抗图像和与对抗图像对应的正确标签加入训练数据集中对神经网络进行训练,从而提高了神经网络对对抗图像的鲁棒性,鲁棒性指的是神经网络对对抗图像仍能够准确识别。In order to improve the robustness of the neural network, people propose adversarial training, that is, adding the adversarial image and the correct label corresponding to the adversarial image to the training data set to train the neural network, thereby improving the robustness of the neural network to the adversarial image. Robustness means that the neural network can still accurately recognize the confrontation image.
但研究发现,随着神经网络处理对抗图像的鲁棒性的提高,神经网络处理原始图像的识别准确率不断降低,因此,需要一种同时能够提升鲁棒性和识别准确率的方案。However, studies have found that with the improvement of the robustness of neural networks for processing counter images, the recognition accuracy of neural networks for processing original images continues to decrease. Therefore, a solution that can simultaneously improve robustness and recognition accuracy is needed.
发明内容Summary of the invention
本申请实施例提供了一种用于图像处理的神经网络以及相关设备,训练后的第一特征提取网络和训练后的第二特征提取网络能够分别提取输入图像中的鲁棒表示和非鲁棒表示,既避免了两者混杂从而导致鲁棒性的降低,也能够同时保留输入图像中的鲁棒表示和非鲁棒表示,从而避免了准确率的降低,以同时提升神经网络的鲁棒性和准确率。The embodiments of the application provide a neural network and related equipment for image processing. The trained first feature extraction network and the trained second feature extraction network can respectively extract robust and non-robust representations in input images. Representation, which not only avoids the mixing of the two and reduces the robustness, but also retains the robust and non-robust representations in the input image at the same time, thereby avoiding the decrease in accuracy and improving the robustness of the neural network at the same time And accuracy.
为解决上述技术问题,本申请实施例提供以下技术方案:In order to solve the above technical problems, the embodiments of this application provide the following technical solutions:
第一方面,本申请实施例提供一种神经网络的训练方法,可用于人工智能领域的图像处理领域中。训练设备将对抗图像分别输入第一特征提取网络和第二特征提取网络,得到第一特征提取网络生成的第一鲁棒表示和第二特征提取网络生成的第一非鲁棒表示。其中,对抗图像为进行过扰动处理的图像,扰动处理指的是在原始图像的基础,调整原始图像中的像素点的像素值,以得到经过扰动处理的图像,对于人眼而言,通常是难以分辨对抗图像与原始图像之间的区别的。第一鲁棒表示和第二鲁棒表示中包括的均为从对抗图像中提取的特征信息。鲁棒表示指的是对扰动不敏感的特征,从原始图像中提取出的鲁棒表示所对应的分类类别,与,从与该原始图像对应的对抗图像中提取出的鲁棒表示所对应的分类 类别一致;非鲁棒表示指的是对扰动敏感的特征,从原始图像中提取出的非鲁棒表示所对应的分类类别,与,从与该原始图像对应的对抗图像中提取出的非鲁棒表示所对应的分类类别不一致。换言之,鲁棒表示中包括的特征信息与人眼利用的特征相似,相反,非鲁棒表示中包括的特征信息不能被人眼理解,对人眼来说非鲁棒表示是噪声。训练设备将第一鲁棒表示输入分类网络,得到分类网络输出的第一分类类别,第一分类类别为对对抗图像中对象的分类类别;将第一非鲁棒表示输入分类网络,得到分类网络输出的第二分类类别,第二分类类别为对对抗图像中对象的分类类别。训练设备根据第一损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件,输出训练后的第一特征提取网络和训练后的第二特征提取网络。其中,第一损失函数用于表示第一类别和第一标注类别之间的相似度,且用于表示第二分类类别和第二标注类别之间的相似度,第一损失函数具体可以为交叉熵损失函数或最大间隔损失函数。第一标注类别为与对抗图像对应的正确类别,第二标注类别为与对抗图像对应的错误类别,第一标注类别包括对抗图像中一个或多个对象的正确分类,第二标注类别对抗图像中一个或多个对象的错误分类,第一标注类别和第二标注类别均用来做训练阶段的监督数据的。收敛条件可以为第一损失函数的收敛条件,也可以为训练的迭代次数达到预设次数。In the first aspect, an embodiment of the present application provides a neural network training method, which can be used in the image processing field of the artificial intelligence field. The training device inputs the confrontation image into the first feature extraction network and the second feature extraction network, respectively, to obtain a first robust representation generated by the first feature extraction network and a first non-robust representation generated by the second feature extraction network. Among them, the counter image is an image that has undergone disturbance processing. Disturbance processing refers to adjusting the pixel values of pixels in the original image based on the original image to obtain a disturbed image. For the human eye, it is usually It is difficult to distinguish the difference between the confrontation image and the original image. Both the first robust representation and the second robust representation include feature information extracted from the confrontation image. Robust representation refers to features that are not sensitive to disturbances, the classification category corresponding to the robust representation extracted from the original image, and the robust representation extracted from the confrontational image corresponding to the original image corresponds to The classification categories are consistent; the non-robust representation refers to the features that are sensitive to disturbances. The classification category corresponding to the non-robust representation extracted from the original image is the same as the non-robust representation extracted from the counter image corresponding to the original image. Robust means that the corresponding classification categories are inconsistent. In other words, the feature information included in the robust representation is similar to the features utilized by the human eye. On the contrary, the feature information included in the non-robust representation cannot be understood by the human eye, and the non-robust representation is noise to the human eye. The training device inputs the first robust representation into the classification network, and obtains the first classification category output by the classification network. The first classification category is the classification category for the object in the confrontation image; the first non-robust representation is input into the classification network to obtain the classification network. The output second classification category, the second classification category is the classification category of the object in the confrontation image. The training device performs iterative training on the first feature extraction network and the second feature extraction network according to the first loss function until the convergence condition is met, and outputs the trained first feature extraction network and the trained second feature extraction network. Among them, the first loss function is used to represent the similarity between the first category and the first label category, and is used to represent the similarity between the second classification category and the second label category, and the first loss function may specifically be cross Entropy loss function or maximum interval loss function. The first label category is the correct category corresponding to the confrontation image, the second label category is the error category corresponding to the confrontation image, the first label category includes the correct classification of one or more objects in the confrontation image, and the second label category is the correct category in the confrontation image. For the misclassification of one or more objects, both the first label category and the second label category are used as supervision data in the training phase. The convergence condition may be the convergence condition of the first loss function, or it may be that the number of iterations of training reaches a preset number.
本实现方式中,神经网络中包括第一特征提取网络和第二特征提取网络,将对抗图像分别输入第一特征提取网络和第二特征提取网络中,得到第一特征提取网络生成的第一鲁棒表示和第二特征提取网络生成的第一非鲁棒表示,进而将第一鲁棒表示输入分类网络中,得到分类网络输出的第一分类类别,将第一非鲁棒表示输入分类网络中,得到分类网络输出的第二分类类别,利用第一损失函数对第一特征提取网络和第二特征提取网络进行迭代训练,第一损失函数的目的是拉近第一分类类别与对抗图像的正确类别之间的相似度,且拉近第二分类类别与对抗图像的错误类别之间的相似度,也即训练的目的是通过第一特征提取网络提取输入图像中的鲁棒表示,通过第二特征提取网络提取输入图像中的非鲁棒表示,技术人员在研究过程中发现,通过对抗训练使得神经网络只从输入图像中提取的鲁棒表示,而舍弃了非鲁棒表示,导致神经网络处理原始图像时准确率的下降,而本申请实施例中,分别通过第一特征提取网络和第二特征提取网络提取输入图像中的鲁棒表示和非鲁棒表示,既避免了两者混杂从而导致鲁棒性的降低,也能够同时保留输入图像中的鲁棒表示和非鲁棒表示,从而避免了准确率的降低,以同时提升神经网络的鲁棒性和准确率。In this implementation, the neural network includes a first feature extraction network and a second feature extraction network, and the confrontation image is input into the first feature extraction network and the second feature extraction network, respectively, to obtain the first feature extraction network generated by the first feature extraction network. The first non-robust representation generated by the robust representation and the second feature extraction network, and then the first robust representation is input into the classification network to obtain the first classification category output by the classification network, and the first non-robust representation is input into the classification network , Obtain the second classification category output by the classification network, and use the first loss function to iteratively train the first feature extraction network and the second feature extraction network. The purpose of the first loss function is to bring the first classification category closer to the correctness of the confrontation image. The similarity between the categories, and the similarity between the second classification category and the wrong category of the confrontation image, that is, the purpose of training is to extract the robust representation in the input image through the first feature extraction network, and through the second The feature extraction network extracts the non-robust representation in the input image. During the research process, the technicians found that through the adversarial training, the neural network only extracts the robust representation from the input image, and discards the non-robust representation, resulting in neural network processing The accuracy of the original image is reduced. In the embodiment of the present application, the robust representation and the non-robust representation in the input image are extracted through the first feature extraction network and the second feature extraction network respectively, which avoids the mixing of the two and leads to The reduction in robustness can also retain both the robust representation and the non-robust representation in the input image, thereby avoiding the reduction in accuracy, and improving the robustness and accuracy of the neural network at the same time.
在第一方面的一种可能实现方式中,方法还包括:训练设备将原始图像分别输入第一特征提取网络和第二特征提取网络,得到第一特征提取网络生成的第二鲁棒表示和第二特征提取网络生成的第二非鲁棒表示。其中,原始图像指的是未经过扰动处理的图像,或者也可以为直接采集到的图像。训练设备将第二鲁棒表示和第二非鲁棒表示组合,得到组合后的第一表示,将组合后的第一表示输入分类网络,以通过分类网络根据组合后的第一表示执行分类操作,得到分类网络输出的第三分类类别。其中,组合的方式包括以下中的一项或多项:拼接、相加、融合和相乘。训练设备根据第一损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件,可以包括:训练设备根据第一损失函数和第二损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满 足收敛条件。其中,第二损失函数用于表示第三分类类别与第三标注类别之间的相似度,第二损失函数具体可以为交叉熵损失函数或最大间隔损失函数。第三标注类别为与原始图像对应的正确类别,第三标注类别为对原始图像中一个或多个对象的正确分类,其中可以包括一个或多个分类类别,是用来做训练阶段的监督数据的。In a possible implementation of the first aspect, the method further includes: the training device inputs the original image into the first feature extraction network and the second feature extraction network, respectively, to obtain the second robust representation and the first feature extraction network generated by the first feature extraction network. The second non-robust representation generated by the two feature extraction network. Among them, the original image refers to an image that has not undergone perturbation processing, or it can also be an image directly collected. The training device combines the second robust representation and the second non-robust representation to obtain the combined first representation, and inputs the combined first representation into the classification network to perform classification operations based on the combined first representation through the classification network , Get the third classification category output by the classification network. Among them, the combination method includes one or more of the following: splicing, addition, fusion, and multiplication. The training device performs iterative training on the first feature extraction network and the second feature extraction network according to the first loss function until the convergence condition is met, which may include: the training device extracts the first feature according to the first loss function and the second loss function The network and the second feature extraction network are iteratively trained until the convergence condition is met. The second loss function is used to indicate the similarity between the third classification category and the third label category, and the second loss function may specifically be a cross-entropy loss function or a maximum interval loss function. The third label category is the correct category corresponding to the original image, and the third label category is the correct classification of one or more objects in the original image, which can include one or more classification categories, which are used as supervision data in the training phase of.
本实现方式中,在训练过程中,不仅利用对抗图像来训练第一特征提取网络和第二特征提取网络的特征提取能力,而且利用自然图像来训练第一特征提取网络和第二特征提取网络的特征提取能力,以进一步提高训练后的第一特征提取网络和训练后的第二特征提取网络在处理自然图像过程的准确度。In this implementation, during the training process, not only the confrontation images are used to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also natural images are used to train the first feature extraction network and the second feature extraction network. Feature extraction capabilities to further improve the accuracy of the trained first feature extraction network and the trained second feature extraction network in the process of processing natural images.
在第一方面的一种可能实现方式中,方法还可以包括:训练设备将原始图像输入第一特征提取网络,得到第一特征提取网络生成的第二鲁棒表示。之后训练设备将第二鲁棒表示输入分类网络,以通过分类网络根据第二鲁棒表示执行分类操作,得到分类网络输出的第四分类类别,第四分类类别包括对原始图像中一个或多个对象的类别。具体的,为了使得第二鲁棒表示的输入格式与组合后的第一表示格式一致,训练设备可以将第二鲁棒表示与一个第一常数张量组合,得到组合后的第三表示,将组合后的第三表示输入分类网络中,以通过分类网络根据组合后的第三表示执行分类操作,得到分类网络输出的第四分类类别。训练设备根据第一损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件,可以包括:训练设备根据第一损失函数和第三损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件。其中,第三损失函数用于表示第四分类类别与第三标注类别之间的相似度,第三损失函数具体可以为交叉熵损失函数或最大间隔损失函数。第三标注类别为与原始图像对应的正确类别。本实现方式中,不仅利用对抗图像来训练第一特征提取网络和第二特征提取网络的特征提取能力,而且利用自然图像训练第一特征提取网络对鲁棒表示的提取能力,以进一步提高训练后第一特征提取网络的准确率。In a possible implementation of the first aspect, the method may further include: training the device to input the original image into the first feature extraction network to obtain a second robust representation generated by the first feature extraction network. After that, the training device inputs the second robust representation into the classification network to perform the classification operation according to the second robust representation through the classification network to obtain the fourth classification category output by the classification network. The fourth classification category includes one or more of the original images. The category of the object. Specifically, in order to make the input format of the second robust representation consistent with the combined first representation format, the training device may combine the second robust representation with a first constant tensor to obtain the combined third representation, and The combined third representation is input to the classification network, and the classification operation is performed by the classification network according to the combined third representation to obtain the fourth classification category output by the classification network. The training device performs iterative training on the first feature extraction network and the second feature extraction network according to the first loss function until the convergence condition is met, which may include: the training device extracts the first feature according to the first loss function and the third loss function The network and the second feature extraction network are iteratively trained until the convergence condition is met. The third loss function is used to indicate the similarity between the fourth classification category and the third label category, and the third loss function may specifically be a cross-entropy loss function or a maximum interval loss function. The third label category is the correct category corresponding to the original image. In this implementation, not only the adversarial images are used to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also natural images are used to train the robust representation extraction capabilities of the first feature extraction network to further improve the post-training The accuracy of the first feature extraction network.
在第一方面的一种可能实现方式中,第一常数张量的长度与第二非鲁棒表示的长度相同,第二鲁棒表示和第一常数张量的前后位置可以与第二鲁棒表示和第二非鲁棒表示的前后位置对应,若组合后的第一表示中第二鲁棒表示在前且第二非鲁棒表示在后,则组合后的第三表示中第二鲁棒表示在前且第一常数张量在后;若组合后的第一表示中第二鲁棒非表示在前且第二鲁棒表示在后,则组合后的第三表示中第一常数张量在前且第二鲁棒表示在后。In a possible implementation of the first aspect, the length of the first constant tensor is the same as the length of the second non-robust representation, and the front and back positions of the second robust representation and the first constant tensor can be the same as those of the second robust representation. The representation corresponds to the front and back positions of the second non-robust representation. If the second robust representation is in the front and the second non-robust representation in the combined first representation, then the combined third representation is the second robust Represents the front and the first constant tensor; if the second robust non-representation is the front and the second robust representation is the back of the combined first representation, then the first constant tensor is the combined third representation The first and second robust representation is second.
在第一方面的一种可能实现方式中,方法还可以包括:训练设备将原始图像输入第二特征提取网络,得到第二特征提取网络生成的第二非鲁棒表示。之后训练设备将第二非鲁棒表示输入分类网络,以通过分类网络根据第二非鲁棒表示执行分类操作,得到分类网络输出的第五分类类别,第五分类类别包括对原始图像中一个或多个对象的类别。具体的,为了使得第二非鲁棒表示的输入格式与组合后的第一表示格式一致,训练设备可以将第二鲁棒表示与一个第二常数张量组合,得到组合后的第四表示,将组合后的第四表示输入分类网络中,以通过分类网络根据组合后的第四表示执行分类操作,得到分类网络输出的第四分类类别。训练设备根据第一损失函数,对第一特征提取网络和第二特征提取网络进行 迭代训练,直至满足收敛条件,可以包括:训练设备根据第一损失函数和第四损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件。其中,第四损失函数用于表示第五分类类别与第三标注类别之间的相似度,第四损失函数具体可以为交叉熵损失函数或最大间隔损失函数。第三标注类别为与原始图像对应的正确类别。In a possible implementation of the first aspect, the method may further include: training the device to input the original image into the second feature extraction network to obtain a second non-robust representation generated by the second feature extraction network. After that, the training device inputs the second non-robust representation into the classification network to perform a classification operation according to the second non-robust representation through the classification network to obtain the fifth classification category output by the classification network. The fifth classification category includes one or Multiple object categories. Specifically, in order to make the input format of the second non-robust representation consistent with the combined first representation format, the training device may combine the second robust representation with a second constant tensor to obtain the combined fourth representation, The combined fourth representation is input into the classification network to perform a classification operation according to the combined fourth representation through the classification network to obtain the fourth classification category output by the classification network. The training device performs iterative training on the first feature extraction network and the second feature extraction network according to the first loss function until the convergence condition is met, which may include: the training device extracts the first feature according to the first loss function and the fourth loss function The network and the second feature extraction network are iteratively trained until the convergence condition is met. Wherein, the fourth loss function is used to indicate the similarity between the fifth classification category and the third label category, and the fourth loss function may specifically be a cross-entropy loss function or a maximum interval loss function. The third label category is the correct category corresponding to the original image.
本实现方式中,不仅利用对抗图像来训练第一特征提取网络和第二特征提取网络的特征提取能力,而且利用自然图像训练第二特征提取网络对非鲁棒表示的提取能力,以进一步提高训练后第二特征提取网络的准确率。In this implementation, not only the adversarial images are used to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also natural images are used to train the second feature extraction network's ability to extract non-robust representations to further improve training. The accuracy of the second feature extraction network.
在第一方面的一种可能实现方式中,第二常数张量的长度与第二鲁棒表示的长度相同,第二非鲁棒表示和第二常数张量的前后位置可以与第二鲁棒表示和第二非鲁棒表示的前后位置对应,若组合后的第一表示中第二鲁棒表示在前且第二非鲁棒表示在后,则组合后的第四表示中第二常数张量在前且第二非鲁棒表示在后;若组合后的第一表示中第二鲁棒非表示在前且第二鲁棒表示在后,则组合后的第四表示中第二非鲁棒表示在前且第二常数张量在后。In a possible implementation of the first aspect, the length of the second constant tensor is the same as the length of the second robust representation, and the front and back positions of the second non-robust representation and the second constant tensor can be the same as the second robust representation. The representation corresponds to the front and rear positions of the second non-robust representation. If the second robust representation is in the front and the second non-robust representation in the combined first representation, the second constant is the second constant in the combined fourth representation. The quantity is first and the second non-robust representation is behind; if the second robust non-representation is first and the second robust non-representation is behind in the combined first representation, the second non-robust representation is the second non-robust representation in the combined fourth representation. The bars indicate the front and the second constant tensor behind.
在第一方面的一种可能实现方式中,训练设备将原始图像分别输入第一特征提取网络和第二特征提取网络,得到第一特征提取网络生成的第二鲁棒表示和第二特征提取网络生成的第二非鲁棒表示。训练设备将第二鲁棒表示和第二鲁棒表示组合,得到组合后的第一表示,将组合后的第一表示输入分类网络,以通过分类网络根据组合后的第一表示执行分类操作,得到分类网络输出的第三分类类别。训练设备将第二鲁棒表示输入分类网络,以通过分类网络根据第二鲁棒表示执行分类操作,得到分类网络输出的第四分类类别。训练设备将第二非鲁棒表示输入分类网络,以通过分类网络根据第二非鲁棒表示执行分类操作,得到分类网络输出的第五分类类别。训练设备根据第一损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件,可以包括:训练设备根据第一损失函数和第五损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件,其中,第五损失函数用于表示第四分类类别与第三标注类别之间的相似度,且用于表示第五分类类别与第三标注类别之间的相似度,且用于表示第六分类类别与第三标注类别之间的相似度,第五损失函数具体可以为交叉熵损失函数或最大间隔损失函数。第三标注类别为与原始图像对应的正确类别。In a possible implementation of the first aspect, the training device inputs the original image into the first feature extraction network and the second feature extraction network, respectively, to obtain the second robust representation generated by the first feature extraction network and the second feature extraction network The second non-robust representation generated. The training device combines the second robust representation and the second robust representation to obtain the combined first representation, and inputs the combined first representation into the classification network to perform classification operations according to the combined first representation through the classification network, Obtain the third classification category output by the classification network. The training device inputs the second robust representation into the classification network to perform a classification operation according to the second robust representation through the classification network to obtain the fourth classification category output by the classification network. The training device inputs the second non-robust representation into the classification network to perform a classification operation according to the second non-robust representation through the classification network to obtain the fifth classification category output by the classification network. The training device performs iterative training on the first feature extraction network and the second feature extraction network according to the first loss function until the convergence condition is met, which may include: the training device extracts the first feature according to the first loss function and the fifth loss function The network and the second feature extraction network are trained iteratively until the convergence condition is met. The fifth loss function is used to represent the similarity between the fourth classification category and the third label category, and is used to represent the fifth classification category and the first The similarity between the three label categories is used to indicate the similarity between the sixth classification category and the third label category, and the fifth loss function may specifically be a cross-entropy loss function or a maximum interval loss function. The third label category is the correct category corresponding to the original image.
本实现方式中,在提高第一特征提取网络和第二特征提取网络对对抗图像的处理能力的同时,也提高第一特征提取网络和第二特征提取网络对自然图像的处理能力,也即无论是自然图像还是对抗图像,训练后的第一特征提取网络和第二特征提取网络均能准确的提取出鲁棒表示和非鲁棒表示,扩展了本方案的应用场景。In this implementation, while improving the processing capabilities of the first feature extraction network and the second feature extraction network on confronting images, it also improves the processing capabilities of the first feature extraction network and the second feature extraction network on natural images, that is, regardless of Whether it is a natural image or an adversarial image, both the trained first feature extraction network and the second feature extraction network can accurately extract robust and non-robust representations, which expands the application scenarios of this solution.
在第一方面的一种可能实现方式中,方法还可以包括:训练设备根据第二损失函数的函数值,生成第一梯度;根据第一梯度对原始图像进行扰动处理,以生成对抗图像,将第三标注类别确定为第一标注类别。具体的,训练设备可以根据第三分类类别和第三标注类别,生成第二损失函数的函数值,并根据第二损失函数的函数值,生成第一梯度,将第一梯度带入到预设函数中,之后乘以预设系数,以得到扰动,再将得到的扰动与原始图像叠加,以生成对抗图像。本实现方式中,根据第三分类类别与第三标注类别之间的相似度, 生成第一梯度,并根据第一梯度对原始图像进行扰动,使得扰动处理更具有针对性,有利于加快第一特征提取网络和第二特征提取网络的训练过程,提高训练过程的效率。In a possible implementation of the first aspect, the method may further include: the training device generates a first gradient according to the function value of the second loss function; perturbs the original image according to the first gradient to generate a confrontation image, and The third label category is determined as the first label category. Specifically, the training device may generate the function value of the second loss function according to the third classification category and the third label category, and generate the first gradient according to the function value of the second loss function, and bring the first gradient into the preset In the function, the preset coefficient is then multiplied to obtain the disturbance, and then the obtained disturbance is superimposed with the original image to generate a confrontation image. In this implementation, the first gradient is generated according to the similarity between the third classification category and the third annotation category, and the original image is perturbed according to the first gradient, so that the perturbation processing is more pertinent and helps speed up the first The training process of the feature extraction network and the second feature extraction network improves the efficiency of the training process.
在第一方面的一种可能实现方式中,方法还可以包括:训练设备根据第三损失函数的函数值,生成第二梯度;根据第二梯度对原始图像进行扰动处理,以生成对抗图像,将第三标注类别确定为第一标注类别。具体的,训练设备可以根据第四分类类别和第三标注类别,生成第三损失函数的函数值,并根据第三损失函数的函数值,生成第二梯度,将第二梯度带入到预设函数中,之后乘以预设系数,以得到扰动,再将得到的扰动与原始图像叠加,以生成对抗图像。本实现方式中,根据分类网络根据第二鲁棒表示输出的第四分类类别与第三标注类别之间的相似度,对原始图像进行扰动,使得扰动处理与第一特征提取网络之间更具有针对性,有利于提高第一特征提取网络对鲁棒表示的特征提取能力。In a possible implementation of the first aspect, the method may further include: training the device to generate a second gradient according to the function value of the third loss function; performing perturbation processing on the original image according to the second gradient to generate a confrontation image, and The third label category is determined as the first label category. Specifically, the training device may generate the function value of the third loss function according to the fourth classification category and the third label category, and generate the second gradient according to the function value of the third loss function, and bring the second gradient into the preset In the function, the preset coefficient is then multiplied to obtain the disturbance, and then the obtained disturbance is superimposed with the original image to generate a confrontation image. In this implementation, the original image is disturbed according to the similarity between the fourth classification category output by the second robust representation and the third annotation category according to the classification network, so that the disturbance processing and the first feature extraction network have a better relationship with each other. Pertinence is conducive to improving the feature extraction capability of the first feature extraction network for robust representation.
在第一方面的一种可能实现方式中,方法还可以包括:训练设备根据第四损失函数的函数值,生成第三梯度;根据第三梯度对原始图像进行扰动处理,以生成对抗图像,将第三标注类别确定为第一标注类别。具体的,训练设备可以根据第五分类类别和第三标注类别,生成第四损失函数的函数值,并根据第四损失函数的函数值,生成第三梯度,将第三梯度带入到预设函数中,之后乘以预设系数,以得到扰动,再将得到的扰动与原始图像叠加,以生成对抗图像。本实现方式中,根据分类网络根据第二非鲁棒表示第一特征提取网络输出的第五分类类别与第三标注类别之间的相似度,对原始图像进行扰动,使得扰动处理与第二特征提取网络之间更具有针对性,有利于提高第一特征提取网络对非鲁棒表示的特征提取能力。In a possible implementation of the first aspect, the method may further include: training the device to generate a third gradient according to the function value of the fourth loss function; performing perturbation processing on the original image according to the third gradient to generate a confrontation image, and The third label category is determined as the first label category. Specifically, the training device may generate the function value of the fourth loss function according to the fifth classification category and the third label category, and generate the third gradient according to the function value of the fourth loss function, and bring the third gradient into the preset In the function, the preset coefficient is then multiplied to obtain the disturbance, and then the obtained disturbance is superimposed with the original image to generate a confrontation image. In this implementation, the original image is perturbed according to the similarity between the fifth classification category output by the first feature extraction network and the third annotation category according to the second non-robust representation of the first feature extraction network, so that the perturbation process is consistent with the second feature The extraction networks are more targeted, which helps to improve the feature extraction ability of the first feature extraction network for non-robust representations.
在第一方面的一种可能实现方式中,方法还可以包括:训练设备将第一鲁棒表示和第一非鲁棒表示组合,得到组合后的第二表示,将组合后的第二表示输入分类网络,得到分类网络输出的第六分类类别,第六分类类别为对抗图像中对象的类别。训练设备将第一鲁棒表示输入分类网络,得到分类网络输出的第一分类类别,将第一非鲁棒表示输入分类网络,得到分类网络输出的第二分类类别,可以包括:训练设备在第六分类类别与第一标注类别不同的情况下,将第一鲁棒表示输入分类网络,得到分类网络输出的第一分类类别,将第一非鲁棒表示输入分类网络,得到分类网络输出的第二分类类别。本实现方式中,若第六分类类别与第一标注类别相同,则证明该扰动后图像的扰动过于轻微,则对于第一特征提取网络和第二特征提取网络而言,处理方式与对自然图像的处理方式相差不大,而此处训练的目的是增强第一特征提取网络和第二特征提取网络从扰动较大的图像中分离鲁棒表示和非鲁棒表示的能力,仅在第六分类类别与第一标注类别不同的情况下,才做后续的训练操作,以提高训练过程的效率。In a possible implementation of the first aspect, the method may further include: the training device combines the first robust representation and the first non-robust representation to obtain a combined second representation, and inputting the combined second representation The classification network obtains the sixth classification category output by the classification network, and the sixth classification category is the category of the object in the confrontation image. The training device inputs the first robust representation into the classification network to obtain the first classification category output by the classification network, and inputs the first non-robust representation into the classification network to obtain the second classification category output by the classification network, which may include: the training device is in the first category When the six classification categories are different from the first label category, the first robust representation is input to the classification network to obtain the first classification category output by the classification network, and the first non-robust representation is input to the classification network to obtain the first classification output of the classification network. Two classification categories. In this implementation, if the sixth classification category is the same as the first annotation category, it proves that the disturbance of the image after the disturbance is too slight. For the first feature extraction network and the second feature extraction network, the processing method is the same as that of the natural image. The processing method of is not much different, and the purpose of training here is to enhance the ability of the first feature extraction network and the second feature extraction network to separate robust and non-robust representations from images with larger disturbances, only in the sixth category When the category is different from the first labeled category, subsequent training operations are performed to improve the efficiency of the training process.
在第一方面的一种可能实现方式中,方法还可以包括:训练设备在第六分类类别与第一标注类别不同的情况下,将第六分类类别确定为第二标注类别。本实现方式中,提供了第二标注类别的一种获取方式,操作简单,且不需要增加额外步骤,节省了计算机资源。In a possible implementation of the first aspect, the method may further include: when the sixth classification category is different from the first label category, the training device determines the sixth classification category as the second label category. In this implementation manner, a method for obtaining the second label category is provided, which is simple to operate, does not require additional steps, and saves computer resources.
在第一方面的一种可能实现方式中,第一特征提取网络为卷积神经网络或者残差神经网络,第二特征提取网络为卷积神经网络或残差神经网络。本实现方式中,提供了第一特征提取网络和第二特征提取网络的两种具体实现方式,提高了本方案的实现灵活性。In a possible implementation of the first aspect, the first feature extraction network is a convolutional neural network or a residual neural network, and the second feature extraction network is a convolutional neural network or a residual neural network. In this implementation manner, two specific implementation manners of the first feature extraction network and the second feature extraction network are provided, which improves the implementation flexibility of the solution.
第二方面,本申请实施例提供了一种图像处理网络,可用于人工智能领域的图像处理领域中。图像处理网络包括第一特征提取网络、第二特征提取网络和特征处理网络,其中,第一特征提取网络,用于接收输入的第一图像,生成与第一图像对应的鲁棒表示,鲁棒表示指的是对扰动不敏感的特征。第二特征提取网络,用于接收输入的第一图像,生成与第一图像对应的非鲁棒表示,非鲁棒表示指的是对扰动敏感的特征。特征处理网络,用于获取鲁棒表示和非鲁棒表示,以输出与第一图像对应的第一处理结果。其中,特征处理网络的具体实现方式以及第一处理结果的具体表现方式均与整个图像处理网络的功能有关。若图像处理网络的功能是图像分类,则特征处理网络为分类网络,第一处理结果用于指示整个图像的分类类别。若图像处理网络的功能是图像识别,则特征处理网络可以为识别网络,第一处理结果用于指示从图像中识别出的内容,例如图像中的文字内容等。若图像处理网络的功能是图像分割,则特征处理网络可以包括分类网络,该分类网络用于生成图像中每个像素点的分类类别,进而利用图像中每个像素点的分类类别对图像进行分割,第一处理结果为分割后的图像。In the second aspect, the embodiments of the present application provide an image processing network, which can be used in the image processing field of the artificial intelligence field. The image processing network includes a first feature extraction network, a second feature extraction network, and a feature processing network. The first feature extraction network is used to receive the inputted first image and generate a robust representation corresponding to the first image. Representation refers to features that are not sensitive to disturbances. The second feature extraction network is used to receive the input first image, and generate a non-robust representation corresponding to the first image. The non-robust representation refers to a feature that is sensitive to disturbances. The feature processing network is used to obtain a robust representation and a non-robust representation to output the first processing result corresponding to the first image. Among them, the specific implementation mode of the feature processing network and the specific expression mode of the first processing result are related to the function of the entire image processing network. If the function of the image processing network is image classification, the feature processing network is a classification network, and the first processing result is used to indicate the classification category of the entire image. If the function of the image processing network is image recognition, the feature processing network may be a recognition network, and the first processing result is used to indicate the content recognized from the image, such as the text content in the image. If the function of the image processing network is image segmentation, the feature processing network can include a classification network, which is used to generate the classification category of each pixel in the image, and then use the classification category of each pixel in the image to segment the image , The first processing result is the segmented image.
本实现方式中,分别通过第一特征提取网络和第二特征提取网络提取输入图像中的鲁棒表示和非鲁棒表示,既避免了两者混杂从而导致鲁棒性的降低,也能够同时保留输入图像中的鲁棒表示和非鲁棒表示,从而避免了准确率的降低,以同时提升神经网络的鲁棒性和准确率。In this implementation, the first feature extraction network and the second feature extraction network are used to extract the robust representation and the non-robust representation in the input image, which not only avoids the mixing of the two and leads to the reduction of robustness, but also retains them at the same time. The robust representation and non-robust representation in the input image avoids the decrease in accuracy, and improves the robustness and accuracy of the neural network at the same time.
在第二方面的一种可能实现方式中,特征处理网络,具体可以用于:在第一情况下,将鲁棒表示和非鲁棒表示组合,根据组合后的表示,输出与第一图像对应的第一处理结果;在第二情况下,根据鲁棒表示,输出与第一图像对应的第一处理结果,第一情况和第二情况为不同的情况。其中,第一情况指的可以为在对第一处理结果的准确率要求高的情况,第二情况指的可以为对第一处理结果的鲁棒性要求高的情况下,或者,第二情况指的是输入的图像为扰动后图像的概率很高的情况。本实现方式中,图像处理网络中同时包括鲁棒路径和标准路径,用户可以根据实际情况灵活选择使用哪一条路径,扩展了本方案的应用场景,提高了本方案的实现灵活性。In a possible implementation of the second aspect, the feature processing network can be specifically used for: In the first case, the robust representation and the non-robust representation are combined, and the output corresponds to the first image according to the combined representation In the second case, according to the robust representation, the first processing result corresponding to the first image is output, and the first case and the second case are different cases. Among them, the first case may refer to the case where the accuracy of the first processing result is high, and the second case may refer to the case where the robustness of the first processing result is high, or the second case It refers to the situation where the probability of the input image being the disturbed image is very high. In this implementation manner, the image processing network includes both a robust path and a standard path, and the user can flexibly choose which path to use according to the actual situation, which expands the application scenarios of this solution and improves the implementation flexibility of this solution.
在第二方面的一种可能实现方式中,特征处理网络,还可以用于在第三情况下,根据非鲁棒表示,输出与第一图像对应的第一处理结果。In a possible implementation of the second aspect, the feature processing network may also be used to output the first processing result corresponding to the first image according to the non-robust representation in the third case.
在第二方面的一种可能实现方式中,特征处理网络具体表现为分类网络,分类网络具体可以用于:根据组合后的表示执行分类操作,输出与第一图像对应的分类类别;或者,根据鲁棒表示执行分类操作,输出与第一图像对应的分类类别;或者,根据非鲁棒表示执行分类操作,输出与第一图像对应的分类类别。本实现方式中,将提供的图像处理方法落到图像分类这一具体应用场景中,提高了与应用场景的结合程度。In a possible implementation of the second aspect, the feature processing network is specifically represented as a classification network, and the classification network can be specifically used to: perform a classification operation according to the combined representation, and output a classification category corresponding to the first image; or, according to Robust means performing a classification operation and outputting a classification category corresponding to the first image; or performing a classification operation according to a non-robust representation and outputting a classification category corresponding to the first image. In this implementation manner, the provided image processing method falls into the specific application scenario of image classification, which improves the degree of integration with the application scenario.
在第二方面的一种可能实现方式中,若图像处理网络的功能为判断第一图像是原始图像还是对抗图像,则第一处理结果指示第一图像为原始图像,或者,第一处理结果指示第一图像为扰动后的图像。本实现方式中,不仅可以利用第一特征提取网络和第二提取网络提取到的特征信息,得到与图像中对象对应的处理结果,还可以得到与图像整体对应的处理结果,也即用于判断图像为原始图像还是扰动后的图像,扩展了本方案的应用场景。In a possible implementation of the second aspect, if the function of the image processing network is to determine whether the first image is an original image or a confrontational image, the first processing result indicates that the first image is an original image, or the first processing result indicates The first image is the disturbed image. In this implementation, not only can the feature information extracted by the first feature extraction network and the second extraction network be used to obtain the processing result corresponding to the object in the image, but also the processing result corresponding to the entire image can be obtained, that is, for judgment Whether the image is the original image or the disturbed image expands the application scenarios of this solution.
在第二方面的一种可能实现方式中,特征处理网络,可以用于:根据鲁棒表示,生成与第一图像对应的第一分类类别;根据非鲁棒表示,生成与第一图像对应的第二分类类别;在第一分类类别与第二分类类别一致的情况下,输出的第一处理结果指示第一图像为原始图像;在第一分类类别与第二分类类别不一致的情况下,输出的第一处理结果指示第一图像为扰动后的图像。本实现方式中,通过判断第七分类类别和第八分类类别是否一致,来确定第一图像为原始图像还是对抗图像,方法简单,可操作性强。In a possible implementation of the second aspect, the feature processing network can be used to: generate the first classification category corresponding to the first image according to the robust representation; generate the first classification corresponding to the first image according to the non-robust representation The second classification category; when the first classification category is consistent with the second classification category, the output of the first processing result indicates that the first image is the original image; when the first classification category is inconsistent with the second classification category, output The first processing result of indicates that the first image is a disturbed image. In this implementation manner, by judging whether the seventh classification category and the eighth classification category are consistent, to determine whether the first image is the original image or the confrontation image, the method is simple and the operability is strong.
在第二方面的一种可能实现方式中,特征处理网络,具体可以用于:将鲁棒表示和非鲁棒表示组合,并根据组合后的表示执行检测操作,以输出与第一图像对应的检测结果,第一处理结果包括检测结果。其中,在一种情况下,检测结果可以指示第一图像为原始图像还是扰动后图像;在另一种情况下,检测结果也可以指示第一图像中包括哪些对象,也即可以指示第一图像中包括的至少一个对象的对象类型,可选地,检测结果还可以包括前述至少一个对象中每个对象的位置信息。本实现方式中,提供了确定第一图像为原始图像还是对抗图像的另一种实现方式,增强了本方案的实现灵活性。In a possible implementation of the second aspect, the feature processing network can be specifically used to combine robust representations and non-robust representations, and perform detection operations based on the combined representations, so as to output images corresponding to the first image The detection result, the first processing result includes the detection result. Among them, in one case, the detection result may indicate whether the first image is the original image or the disturbed image; in another case, the detection result may also indicate which objects are included in the first image, that is, it may indicate the first image The object type of the at least one object included in the at least one object, optionally, the detection result may also include the position information of each object in the aforementioned at least one object. In this implementation manner, another implementation manner of determining whether the first image is an original image or a confrontation image is provided, which enhances the implementation flexibility of this solution.
在第二方面的一种可能实现方式中,图像处理网络为以下中的一项或多项:图像分类网络、图像识别网络、图像分割网络或图像检测网络。本实现方式中,提供了图像处理网络的多种具体实现方式,扩展了本方案的应用场景,提高了本方案的实现灵活性。In a possible implementation of the second aspect, the image processing network is one or more of the following: an image classification network, an image recognition network, an image segmentation network, or an image detection network. In this implementation manner, a variety of specific implementation manners of the image processing network are provided, which expands the application scenarios of this solution and improves the implementation flexibility of this solution.
在第二方面的一种可能实现方式中,特征处理网络包括感知机。In a possible implementation of the second aspect, the feature processing network includes a perceptron.
在第二方面的一种可能实现方式中,第一特征提取网络为卷积神经网络或残差神经网络,第二特征提取网络为卷积神经网络或残差神经网络。In a possible implementation of the second aspect, the first feature extraction network is a convolutional neural network or a residual neural network, and the second feature extraction network is a convolutional neural network or a residual neural network.
对于本申请实施例第二方面以及第二方面的各种可能实现方式中某些步骤的具体实现方式,以及每种可能实现方式中名词的具体含义,均可以参考第一方面中各种可能的实现方式中的描述,此处不再一一赘述。For the specific implementation of the second aspect of the embodiments of the present application and the specific steps in the various possible implementations of the second aspect, as well as the specific meaning of the nouns in each possible implementation, reference may be made to the various possible implementations in the first aspect. The description in the implementation mode will not be repeated here.
第三方面,本申请实施例提供了一种神经网络的训练装置,可用于人工智能领域的图像处理领域中。神经网络的训练装置可以包括输入模块和训练模块。其中,输入模块,用于将对抗图像分别输入第一特征提取网络和第二特征提取网络,得到第一特征提取网络生成的第一鲁棒表示和第二特征提取网络生成的第一非鲁棒表示,其中,对抗图像为对原始图像进行过扰动处理的图像,鲁棒表示指的是对扰动不敏感的特征,非鲁棒表示指的是对扰动敏感的特征。输入模块,还用于将第一鲁棒表示输入分类网络,得到分类网络输出的第一分类类别,将第一非鲁棒表示输入分类网络,得到分类网络输出的第二分类类别。训练模块,用于根据第一损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件,输出训练后的第一特征提取网络和训练后的第二特征提取网络。其中,第一损失函数用于表示第一类别和第一标注类别之间的相似度,且用于表示第二分类类别和第二标注类别之间的相似度,第一标注类别为与对抗图像对应的正确类别,第二标注类别为与对抗图像对应的错误类别。In the third aspect, an embodiment of the present application provides a neural network training device, which can be used in the image processing field of the artificial intelligence field. The training device of the neural network may include an input module and a training module. Among them, the input module is used to input the confrontation image into the first feature extraction network and the second feature extraction network to obtain the first robust representation generated by the first feature extraction network and the first non-robust representation generated by the second feature extraction network. Representation, where the adversarial image is an image that has undergone disturbance processing on the original image, robust representation refers to features that are not sensitive to disturbance, and non-robust representation refers to features that are sensitive to disturbance. The input module is also used to input the first robust representation into the classification network to obtain the first classification category output by the classification network, and input the first non-robust representation into the classification network to obtain the second classification category output by the classification network. The training module is used to iteratively train the first feature extraction network and the second feature extraction network according to the first loss function until the convergence condition is met, and output the trained first feature extraction network and the trained second feature extraction network . Among them, the first loss function is used to represent the similarity between the first category and the first label category, and is used to represent the similarity between the second classification category and the second label category, and the first label category is the same as the adversarial image Corresponding to the correct category, the second label category is the wrong category corresponding to the confrontation image.
本申请实施例第三方面中,神经网络的训练装置包括各个模块还可以用于实现第一方面各种可能实现方式中的步骤,对于本申请实施例第三方面以及第三方面的各种可能实现方式中某些步骤的具体实现方式,以及每种可能实现方式所带来的有益效果,均可以参考 第一方面中各种可能的实现方式中的描述,此处不再一一赘述。In the third aspect of the embodiments of the present application, the neural network training device including various modules can also be used to implement the steps in the various possible implementations of the first aspect. For the third aspect of the embodiments of the present application and the various possibilities of the third aspect For the specific implementation manners of certain steps in the implementation manners, and the beneficial effects brought by each possible implementation manner, reference may be made to the descriptions in the various possible implementation manners in the first aspect, which will not be repeated here.
第四方面,本申请实施例提供了一种图像处理方法,可用于人工智能领域的图像处理领域中。方法可以包括:执行设备将第一图像输入第一特征提取网络,得到第一特征提取网络生成的与第一图像对应的鲁棒表示,鲁棒表示指的是对扰动不敏感的特征;执行设备将第一图像输入第二特征提取网络,得到第二特征提取网络生成的与第一图像对应的非鲁棒表示,非鲁棒表示指的是对扰动敏感的特征;执行设备通过特征处理网络,根据鲁棒表示和非鲁棒表示,输出与第一图像对应的第一处理结果,第一特征提取网络、第二特征提取网络和特征处理网络归属于同一图像处理网络。In a fourth aspect, an embodiment of the present application provides an image processing method, which can be used in the image processing field of the artificial intelligence field. The method may include: the execution device inputs the first image into the first feature extraction network to obtain a robust representation corresponding to the first image generated by the first feature extraction network, and the robust representation refers to features that are not sensitive to disturbance; the execution device The first image is input into the second feature extraction network to obtain a non-robust representation corresponding to the first image generated by the second feature extraction network. The non-robust representation refers to the feature sensitive to disturbance; the execution device uses the feature processing network to According to the robust representation and the non-robust representation, the first processing result corresponding to the first image is output, and the first feature extraction network, the second feature extraction network, and the feature processing network belong to the same image processing network.
本申请实施例第四方面中,执行设备还可以用于实现第二方面各种可能实现方式中的步骤,对于本申请实施例第四方面以及第四方面的各种可能实现方式中某些步骤的具体实现方式,以及每种可能实现方式所带来的有益效果,均可以参考第二方面中各种可能的实现方式中的描述,此处不再一一赘述。In the fourth aspect of the embodiments of the present application, the execution device may also be used to implement steps in the various possible implementation manners of the second aspect. For the fourth aspect of the embodiments of the present application and some steps in the various possible implementation manners of the fourth aspect For the specific implementation manner of and the beneficial effects brought by each possible implementation manner, reference may be made to the descriptions in the various possible implementation manners in the second aspect, which will not be repeated here.
第五方面,本申请实施例提供了一种训练设备,可以包括处理器,处理器和存储器耦合,存储器存储有程序指令,当存储器存储的程序指令被处理器执行时实现上述第一方面所述的神经网络的训练方法。对于处理器执行第一方面的各个可能实现方式中训练设备执行的步骤,具体均可以参阅第一方面,此处不再赘述。In a fifth aspect, an embodiment of the present application provides a training device, which may include a processor, the processor is coupled to a memory, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the above-mentioned first aspect is implemented The training method of the neural network. For the steps executed by the training device in each possible implementation manner of the first aspect by the processor, please refer to the first aspect for details, which will not be repeated here.
第六方面,本申请实施例提供了一种执行设备,可以包括处理器,处理器和存储器耦合,存储器存储有程序指令,当存储器存储的程序指令被处理器执行时实现上述第二方面所述的图像处理网络执行的步骤。对于处理器执行第二方面的各个可能实现方式中图像处理网络执行的步骤,具体均可以参阅第二方面,此处不再赘述。In a sixth aspect, an embodiment of the present application provides an execution device, which may include a processor, the processor and a memory are coupled, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the foregoing second aspect is implemented The steps performed by the image processing network. For the steps executed by the image processing network in each possible implementation manner of the second aspect by the processor, please refer to the second aspect for details, which will not be repeated here.
第七方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面所述的神经网络的训练方法,或者,使得计算机执行上述第四方面所述的图像处理方法。In a seventh aspect, the embodiments of the present application provide a computer-readable storage medium in which a computer program is stored, and when it runs on a computer, the computer executes the neural network described in the first aspect. The training method of the network, or the computer executes the image processing method described in the fourth aspect.
第八方面,本申请实施例提供了一种电路系统,所述电路系统包括处理电路,所述处理电路配置为执行上述第一方面所述的神经网络的训练方法,或者,所述处理电路配置为执行上述第四方面所述的图像处理方法。In an eighth aspect, an embodiment of the present application provides a circuit system, the circuit system includes a processing circuit configured to execute the neural network training method described in the first aspect above, or the processing circuit configuration To implement the image processing method described in the fourth aspect.
第九方面,本申请实施例提供了一种计算机程序,当其在计算机上行驶时,使得计算机执行上述第一方面所述的神经网络的训练方法,或者,使得计算机执行上述第四方面所述的图像处理方法。In a ninth aspect, an embodiment of the present application provides a computer program that, when running on a computer, causes the computer to execute the neural network training method described in the first aspect, or causes the computer to execute the neural network training method described in the fourth aspect. Image processing method.
第十方面,本申请实施例提供了一种芯片系统,该芯片系统包括处理器,用于支持训练设备或图像处理网络实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存服务器或通信设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。In a tenth aspect, an embodiment of the present application provides a chip system that includes a processor for supporting training equipment or an image processing network to implement the functions involved in the above aspects, for example, sending or processing the functions involved in the above methods Data and/or information. In a possible design, the chip system further includes a memory, and the memory is used to store necessary program instructions and data for the server or the communication device. The chip system can be composed of chips, and can also include chips and other discrete devices.
附图说明Description of the drawings
图1为本申请实施例提供的人工智能主体框架的一种结构示意图;FIG. 1 is a schematic diagram of a structure of an artificial intelligence main frame provided by an embodiment of this application;
图2为本申请实施例提供的图像处理系统的一种系统架构图;FIG. 2 is a system architecture diagram of an image processing system provided by an embodiment of the application;
图3为本申请实施例提供的神经网络的训练方法的一种流程示意图;FIG. 3 is a schematic flowchart of a neural network training method provided by an embodiment of the application;
图4为本申请实施例提供的神经网络的训练方法中扰动操作的一种示意图;FIG. 4 is a schematic diagram of a disturbance operation in a neural network training method provided by an embodiment of the application;
图5为本申请实施例提供的神经网络的训练方法中可视化处理后的鲁棒表示和非鲁棒表示的一种示意图;FIG. 5 is a schematic diagram of the robust representation and the non-robust representation after visualization processing in the neural network training method provided by the embodiment of the application;
图6为本申请实施例提供的图像处理方法的一种流程示意图;FIG. 6 is a schematic flowchart of an image processing method provided by an embodiment of the application;
图7为本申请实施例提供的图像处理方法中图像处理网络的一个示意图;FIG. 7 is a schematic diagram of an image processing network in an image processing method provided by an embodiment of this application;
图8为本申请实施例提供的图像处理方法的一种流程示意图;FIG. 8 is a schematic flowchart of an image processing method provided by an embodiment of the application;
图9为本申请实施例提供的神经网络的训练装置的一种结构示意图;FIG. 9 is a schematic structural diagram of a neural network training device provided by an embodiment of the application;
图10为本申请实施例提供的神经网络的训练装置的一种结构示意图;FIG. 10 is a schematic structural diagram of a neural network training device provided by an embodiment of the application;
图11为本申请实施例提供的神经网络的训练装置的一种结构示意图;FIG. 11 is a schematic structural diagram of a neural network training device provided by an embodiment of the application;
图12为本申请实施例提供的执行设备的一种结构示意图;FIG. 12 is a schematic structural diagram of an execution device provided by an embodiment of this application;
图13为本申请实施例提供的训练设备的一种结构示意图;FIG. 13 is a schematic structural diagram of a training device provided by an embodiment of the application;
图14为本申请实施例提供的芯片的一种结构示意图。FIG. 14 is a schematic diagram of a structure of a chip provided by an embodiment of the application.
具体实施方式Detailed ways
本申请实施例提供了一种用于图像处理的神经网络以及相关设备,训练后的第一特征提取网络和训练后的第二特征提取网络能够分别提取输入图像中的鲁棒表示和非鲁棒表示,既避免了两者混杂从而导致鲁棒性的降低,也能够同时保留输入图像中的鲁棒表示和非鲁棒表示,从而避免了准确率的降低,以同时提升神经网络的鲁棒性和准确率。The embodiments of the present application provide a neural network and related equipment for image processing. The trained first feature extraction network and the trained second feature extraction network can respectively extract robust and non-robust representations in input images. Representation, which not only avoids the mixing of the two and reduces the robustness, but also retains the robust and non-robust representations in the input image at the same time, thereby avoiding the decrease in accuracy and improving the robustness of the neural network at the same time And accuracy.
本申请的说明书和权利要求书及上述附图中的术语“第一”、第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms “first”, second, etc. in the specification and claims of this application and the above-mentioned drawings are used to distinguish similar objects, and not necessarily used to describe a specific order or sequence. It should be understood that the terms used in this way It can be interchanged under appropriate circumstances. This is only the way of distinguishing objects with the same attribute in the description of the embodiments of the present application. In addition, the terms "including" and "having" and any variations of them are intended to be Covering non-exclusive inclusion, so that the process, method, system, product or equipment containing a series of units need not be limited to those units, but may include other units that are not clearly listed or are inherent to these processes, methods, products or equipment .
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。First, the overall workflow of the artificial intelligence system is described. Please refer to Figure 1. Figure 1 shows a schematic diagram of the main framework of artificial intelligence. (Vertical axis) Two dimensions explain the above-mentioned artificial intelligence theme framework. Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensing process of "data-information-knowledge-wisdom". The "IT value chain" from the underlying infrastructure of human intelligence, information (providing and processing technology realization) to the industrial ecological process of the system, reflects the value that artificial intelligence brings to the information technology industry.
(1)基础设施(1) Infrastructure
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片提供,作为示例,该智能芯片包括中央处理器(central processing unit,CPU)、神经网络处理器(neural-network processing  unit,NPU)、图形处理器(graphics processing unit,GPU)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程逻辑门阵列(field programmable gate array,FPGA)等硬件加速芯片;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。The infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform. Communicate with the outside through sensors; computing capabilities are provided by smart chips. As an example, the smart chips include central processing unit (CPU), neural-network processing unit (NPU), and graphics processor (graphics). Processing unit, GPU), application specific integrated circuit (ASIC), field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips; the basic platform includes distributed computing framework and network related platforms Guarantee and support can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with the outside to obtain data, and these data are provided to the smart chip in the distributed computing system provided by the basic platform for calculation.
(2)数据(2) Data
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。The data in the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data involves graphics, images, voice, and text, as well as the Internet of Things data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
(3)数据处理(3) Data processing
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, training, etc.
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formal information to conduct machine thinking and solving problems based on reasoning control strategies. The typical function is search and matching.
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。Decision-making refers to the process of making decisions after intelligent information is reasoned, and usually provides functions such as classification, ranking, and prediction.
(4)通用能力(4) General ability
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。After the above-mentioned data processing is performed on the data, some general capabilities can be formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image Recognition and so on.
(5)智能产品及行业应用(5) Smart products and industry applications
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶、平安城市等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is an encapsulation of the overall solution of artificial intelligence, productizing intelligent information decision-making and realizing landing applications. Its application fields mainly include: intelligent terminals, intelligent manufacturing, Intelligent transportation, smart home, smart medical, smart security, autonomous driving, safe city, etc.
本申请实施例主要可以应用于上述各种应用领域中的图像处理的场景中,作为示例,例如在自动驾驶领域中,自动驾驶车辆上的传感器在采集到原始图像之后,会传输给自动驾驶车辆的处理器,自动驾驶车辆的处理器利用图像处理网络对传输过来的图像进行处理,若在原始图像的传输过程中,原始图像的像素值未受到扰动,则处理器处理的就是原始图像;若在原始图像的传输过程中,原始图像中的像素值受到扰动,则处理器就会处理到对抗图像,也即自动驾驶车辆的处理器处理的图像中可能会同时存在原始图像和对抗图像。作为另一示例,例如在手机、电脑、可穿戴设备等智能终端领域中,智能终端在采集到原始图像之后,若在对原始图像进行补光、添加滤镜等操作的过程中对原始图像的像素值造成了扰动,则智能终端通过神经网络处理的可能是扰动后的图像,也即在智能终端领域也有坑你会同时存在原始图像和对抗图像。其中,对抗图像与原始图像之间的区别通过人眼是察觉不出来的,但对抗图像会导致神经网络的精度大大降低。应当理解,此处举例仅为方便对本申请实施例的应用场景进行理解,不对本申请实施例的应用场景进行穷举。本申请实施例还可以应用于语音处理或文本处理的场景中,本申请实施例中仅以应用于图像处 理的场景中为例,进行详细介绍。The embodiments of the present application can be mainly applied to the image processing scenarios in the above-mentioned various application fields. As an example, in the field of automatic driving, the sensors on the automatic driving vehicle will transmit the original image to the automatic driving vehicle after collecting the original image. The processor of the autonomous vehicle uses the image processing network to process the transmitted image. If the pixel value of the original image is not disturbed during the transmission of the original image, the processor is processing the original image; if During the transmission of the original image, if the pixel values in the original image are disturbed, the processor will process the confrontation image, that is, the image processed by the processor of the autonomous vehicle may contain both the original image and the confrontation image. As another example, for example, in the field of smart terminals such as mobile phones, computers, and wearable devices, after the smart terminal collects the original image, if the original image is filled with light, filters, etc., the original image is If the pixel value causes a disturbance, the image processed by the smart terminal through the neural network may be the disturbed image, that is, there are pitfalls in the smart terminal field. You will have both the original image and the confrontation image. Among them, the difference between the confrontation image and the original image is not detectable by the human eye, but the confrontation image will greatly reduce the accuracy of the neural network. It should be understood that the examples here are only for the convenience of understanding the application scenarios of the embodiments of the present application, and do not exhaustively list the application scenarios of the embodiments of the present application. The embodiments of the present application can also be applied to scenarios of speech processing or text processing. In the embodiments of the present application, only scenarios applied to image processing are taken as an example for detailed introduction.
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments of the present application will be described below in conjunction with the drawings. A person of ordinary skill in the art knows that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are equally applicable to similar technical problems.
为了便于理解本方案,首先结合图2对本申请实施例提供的图像处理系统的系统架构进行介绍,请先参阅图2,图2为本申请实施例提供的图像处理系统的一种系统架构图。在图2中,图像处理系统200包括执行设备210、训练设备220、数据库230和数据存储系统240,执行设备210中包括计算模块211。In order to facilitate the understanding of this solution, first, the system architecture of the image processing system provided by the embodiment of the present application will be introduced with reference to FIG. 2. Please refer to FIG. 2 first. FIG. 2 is a system architecture diagram of the image processing system provided by the embodiment of the present application. In FIG. 2, the image processing system 200 includes an execution device 210, a training device 220, a database 230 and a data storage system 240, and the execution device 210 includes a calculation module 211.
其中,在训练阶段,数据库230中存储有训练数据集合,训练数据集合中包括多个训练图像以及每个训练图像的标注分类,训练设备220生成用于图像的目标模型/规则201,并利用数据库中的训练数据集合对目标模型/规则201进行迭代训练,得到成熟的目标模型/规则201,该目标模型/规则201具体可以表现为图像处理网络。训练设备220得到的图像处理网络可以应用不同的系统或设备中。Among them, in the training phase, the database 230 stores a training data set. The training data set includes multiple training images and the label classification of each training image. The training device 220 generates a target model/rule 201 for the image and uses the database The training data set in is iteratively trained on the target model/rule 201, and a mature target model/rule 201 is obtained. The target model/rule 201 can be specifically represented as an image processing network. The image processing network obtained by the training device 220 can be applied to different systems or devices.
在推理阶段,执行设备210可以调用数据存储系统240中的数据、代码等,也可以将数据、指令等存入数据存储系统240中。数据存储系统240可以置于执行设备210中,也可以为数据存储系统240相对执行设备210是外部存储器。In the inference phase, the execution device 210 can call data, codes, etc. in the data storage system 240, and can also store data, instructions, etc. in the data storage system 240. The data storage system 240 may be placed in the execution device 210, or the data storage system 240 may be an external memory relative to the execution device 210.
计算模块211可以通过图像处理网络对执行设备210采集到的图像进行处理,得到处理结果,处理结果的具体表现形式与图像处理网络的功能相关。The calculation module 211 may process the image collected by the execution device 210 through the image processing network to obtain the processing result, and the specific expression form of the processing result is related to the function of the image processing network.
本申请的一些实施例中,例如图2中,“用户”可以直接与执行设备210进行交互,也即执行设备210与客户设备集成于同一设备中。但图2仅是本发明实施例提供的两种图像处理系统的架构示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。在本申请的另一些实施例中,执行设备210和客户设备可以为分别独立的设备,执行设备210配置有输入/输出接口,与客户设备进行数据交互,“用户”可以通过客户设备向输入/输出接口输入采集到的图像,执行设备210通过输入/输出接口将第一点的图像坐标返回给客户设备。In some embodiments of the present application, for example, in FIG. 2, the "user" can directly interact with the execution device 210, that is, the execution device 210 and the client device are integrated in the same device. However, FIG. 2 is only a schematic diagram of the architecture of the two image processing systems provided by the embodiment of the present invention, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. In some other embodiments of the present application, the execution device 210 and the client device may be independent devices. The execution device 210 is equipped with an input/output interface for data interaction with the client device, and the “user” can input/output data through the client device. The output interface inputs the collected image, and the execution device 210 returns the image coordinates of the first point to the client device through the input/output interface.
结合上述描述,本申请实施例提供了一种神经网络的训练方法和一种图像处理方法,分别应用于训练阶段和推理阶段,由于本申请实施例提供的图像处理系统中的训练阶段和推理阶段的具体实现方式有所不同,下面开始对本申请实施例的训练阶段和推理阶段的具体实现流程进行描述。In combination with the above description, the embodiment of the application provides a neural network training method and an image processing method, which are applied to the training phase and the inference phase, respectively, because the training phase and the inference phase in the image processing system provided by the embodiment of the application are The specific implementation manner of is different. The specific implementation process of the training phase and the inference phase of the embodiment of the present application will be described below.
一、训练阶段1. Training phase
本申请实施例中,训练阶段指的是上述图2中训练设备220利用训练数据执行训练操作的过程。请参阅图3,图3为本申请实施例提供的神经网络的训练方法的一种流程示意图,本申请实施例提供的神经网络的训练方法可以包括:In the embodiment of the present application, the training phase refers to the process in which the training device 220 in FIG. 2 uses training data to perform a training operation. Please refer to FIG. 3. FIG. 3 is a schematic flowchart of a neural network training method provided in an embodiment of the application. The neural network training method provided in an embodiment of the application may include:
301、训练设备获取原始图像和第三标注类别。301. The training device obtains an original image and a third label category.
本申请实施例中,训练设备上配置有训练数据集,训练数据集中可以包括原始图像和与该原始图像对应的第三标注类别。其中,原始图像指的是未经过扰动处理的图像,或者也可以为直接采集到的图像。第三标注类别为与原始图像对应的正确分类,第三标注类别为对原始图像中一个或多个对象的正确分类,其中可以包括一个或多个分类类别,是用来 做训练阶段的监督数据的。作为示例,例如一个图像中包括一个熊猫,则对应的第三标注类别为熊猫;作为另一示例,例如一个图像中包括一个熊猫和一个青蛙,则对应的第三标注类别为熊猫和青蛙,此处举例仅为方便理解本方案,不用于限定本方案。In the embodiment of the present application, a training data set is configured on the training device, and the training data set may include an original image and a third label category corresponding to the original image. Among them, the original image refers to an image that has not undergone perturbation processing, or it can also be an image directly collected. The third label category is the correct classification corresponding to the original image, and the third label category is the correct classification of one or more objects in the original image, which can include one or more classification categories, which are used for the supervision data in the training phase of. As an example, if an image includes a panda, the corresponding third annotation category is panda; as another example, if an image includes a panda and a frog, then the corresponding third annotation category is panda and frog. The examples given here are only to facilitate the understanding of this solution, and are not used to limit this solution.
扰动处理指的是在原始图像的基础,微微调整原始图像中的像素点的像素值,以得到经过扰动处理的图像,经过扰动处理的图像也可以称为对抗图像,对于人眼而言,通常是难以分辨对抗图像与原始图像之间的区别的。具体的,一次扰动处理过程中可以为调整原始图像中每个像素点的像素值,也可以为只原始图像中部分像素点的像素值。扰动具体可以表现为二维矩阵,该二维矩阵的大小与原始图像的大小一致。为更为直观的理解本方案,请参阅图4,图4为本申请实施例提供的神经网络的训练方法中扰动操作的一种示意图。其中,A1代表自然图像,A2代表扰动,A3代表对抗图像,将A1输入到图像分类网络中得到的分类类别为熊猫,将A3输入到图像分类网络中得到的分类类别为长臂猿,从而应理解,图4中的示例仅为方便理解扰动处理的概念,不用于限定本方案。Disturbance processing refers to slightly adjusting the pixel values of pixels in the original image based on the original image to obtain a perturbed image. The perturbed image can also be called a confrontational image. For the human eye, it is usually It is difficult to distinguish the difference between the confrontation image and the original image. Specifically, one disturbance process may be to adjust the pixel value of each pixel in the original image, or it may be the pixel value of only a part of the pixels in the original image. The disturbance can be expressed as a two-dimensional matrix, and the size of the two-dimensional matrix is consistent with the size of the original image. In order to understand this solution more intuitively, please refer to FIG. 4, which is a schematic diagram of a disturbance operation in the neural network training method provided in an embodiment of the application. Among them, A1 represents the natural image, A2 represents the disturbance, and A3 represents the confrontation image. The classification category obtained by inputting A1 into the image classification network is panda, and the classification category obtained by inputting A3 into the image classification network is gibbons, so it should be understood that, The example in FIG. 4 is only to facilitate the understanding of the concept of disturbance processing, and is not used to limit the solution.
进一步地,上述扰动可以存在限制,对于扰动的限制可以通过如下公式进行展示:Further, there may be restrictions on the above-mentioned disturbances, and the restrictions on disturbances may be shown by the following formula:
S={δ:||δ|| p≤ε}; S={δ:||δ|| p ≤ε};
其中,S代表对于扰动δ的限制,δ代表扰动,||δ|| p代表δ的p范数,也可以称为δ的模长,p可以为任何大于或等于1的整数,作为示例,例如p可以取2,或者p可以取无穷,ε为一个固定的预设值,作为示例,例如ε的取值可以为0.3、
Figure PCTCN2021081238-appb-000001
或其他数值等,应理解,此次举例仅为进一步理解扰动这一概念,不用于限定本方案。
Among them, S represents the restriction on the disturbance δ, δ represents the disturbance, ||δ|| p represents the p-norm of δ, which can also be called the modulus length of δ, and p can be any integer greater than or equal to 1, as an example, For example, p can be 2, or p can be infinite, and ε is a fixed preset value. As an example, for example, the value of ε can be 0.3,
Figure PCTCN2021081238-appb-000001
Or other numerical values, etc., it should be understood that this example is only for further understanding the concept of disturbance, and is not used to limit the solution.
302、训练设备将原始图像输入第一特征提取网络,得到第一特征提取网络生成的第二鲁棒表示。302. The training device inputs the original image into the first feature extraction network to obtain a second robust representation generated by the first feature extraction network.
本申请实施例中,训练设备在得到原始图像之后,将原始图像输入第一特征提取网络中,可以得到第一特征提取网络生成的第二鲁棒表示(robust representation)。In the embodiment of the present application, after obtaining the original image, the training device inputs the original image into the first feature extraction network to obtain a second robust representation generated by the first feature extraction network.
其中,第一特征提取网络为卷积神经网络或者残差神经网络。作为示例,例如第一特征提取网络可以为宽残差神经网络34(Wide Residual Networks34,WRNS34)中用于进行特征提取的部分;作为另一示例,例如第一特征提取网络可以为预先激活残差神经网络18(Pre-activated Residual Networks18,PRNS18)中用于进行特征提取的部分,第一特征提取网络还可以表现为其他类型的卷积神经网络或残差神经网络等,此处不做限定。Among them, the first feature extraction network is a convolutional neural network or a residual neural network. As an example, for example, the first feature extraction network may be a part used for feature extraction in Wide Residual Networks 34 (WRNS34); as another example, for example, the first feature extraction network may be a pre-activated residual In the part of neural network 18 (Pre-activated Residual Networks 18, PRNS 18) used for feature extraction, the first feature extraction network may also be expressed as other types of convolutional neural networks or residual neural networks, etc., which is not limited here.
鲁棒表示指的是在从图像提取的特征中对扰动不敏感的特征。从原始图像中提取出的鲁棒表示所对应的分类类别,与,从与该原始图像对应的扰动后图像中提取出的鲁棒表示所对应的分类类别一致。第二鲁棒表示中包括的是从原始图像提取出的特征中对扰动不敏感的特征。第二鲁棒表示具体可以表现为一维数据、二维数据、三维数据或更高维数据等等;第二鲁棒表示的长度可以为500、800、1000或其他长度等,具体此处均不做限定。Robust representation refers to the features that are not sensitive to disturbances among the features extracted from the image. The classification category corresponding to the robust representation extracted from the original image is consistent with the classification category corresponding to the robust representation extracted from the disturbed image corresponding to the original image. The second robust representation includes features that are not sensitive to disturbances among the features extracted from the original image. The second robust representation can specifically be expressed as one-dimensional data, two-dimensional data, three-dimensional data, or higher-dimensional data, etc.; the length of the second robust representation can be 500, 800, 1000 or other lengths, etc., and the details are all here. Not limited.
303、训练设备将原始图像输入第二特征提取网络,得到第二特征提取网络生成的第二非鲁棒表示。303. The training device inputs the original image into the second feature extraction network to obtain a second non-robust representation generated by the second feature extraction network.
本申请些实施例中,训练设备在得到原始图像之后,将原始图像输入第二特征提取网 络,得到第二特征提取网络生成的第二非鲁棒表示(non-robust representation)。In some embodiments of the present application, after obtaining the original image, the training device inputs the original image into the second feature extraction network to obtain a second non-robust representation generated by the second feature extraction network.
其中,第二特征提取网络也可以为卷积神经网络或残差神经网络,第二特征提取网络与第一特征提取网络的功能类似,区别在于训练后的第二特征提取网络与训练后的第一特征提取网络的权重参数不同,从而通过第一特征提取网络提取到的是图像中的鲁棒表示,通过第二特征提取网络提取到的是图像中的非鲁棒表示。对于第二特征提取网络的具体表现形式的举例可以参阅上述对第一特征提取网络的举例中,此处不做赘述。本申请实施例中,提供了第一特征提取网络和第二特征提取网络的两种具体实现方式,提高了本方案的实现灵活性。Among them, the second feature extraction network can also be a convolutional neural network or a residual neural network. The second feature extraction network has similar functions to the first feature extraction network, except that the second feature extraction network after training is different from the first feature extraction network after training. The weight parameters of a feature extraction network are different, so what is extracted through the first feature extraction network is a robust representation in the image, and what is extracted through the second feature extraction network is a non-robust representation in the image. For an example of the specific manifestation of the second feature extraction network, please refer to the above examples of the first feature extraction network, which will not be repeated here. In the embodiment of the present application, two specific implementation manners of the first feature extraction network and the second feature extraction network are provided, which improves the implementation flexibility of the solution.
非鲁棒表示指的是从图像提取的特征中对扰动敏感的特征。从原始图像中提取出的非鲁棒表示所对应的分类类别,与,从与该原始图像对应的扰动后图像中提取出的非鲁棒表示所对应的分类类别不一致。第二非鲁棒表示中包括的是从原始图像提取的出的特征中对扰动敏感的特征,第二非鲁棒表示的具体表现形式和长度与第二鲁棒表示的相似,可参阅上述描述,此处不做赘述。The non-robust representation refers to the features that are sensitive to disturbance among the features extracted from the image. The classification category corresponding to the non-robust representation extracted from the original image is inconsistent with the classification category corresponding to the non-robust representation extracted from the disturbed image corresponding to the original image. The second non-robust representation includes features that are sensitive to disturbances among the features extracted from the original image. The specific form and length of the second non-robust representation are similar to those of the second robust representation. Please refer to the above description. , Do not repeat it here.
为了更直观的理解鲁棒表示和非鲁棒表示,请参阅图5,图5为本申请实施例提供的神经网络的训练方法中可视化处理后的鲁棒表示和非鲁棒表示的一种示意图。其中,B1和B2对应同一个原始图像(原始图像1),B3和B4对应同一个原始图像(原始图像2),原始图像1中的对象是一个松鼠,B1为对从原始图像1中提取出的鲁棒表示进行可视化处理后得到,B2为对从原始图像1中提取出的非鲁棒表示进行可视化处理后得到,通过人眼观看图5,B1中隐约有松鼠的形状,B1中还携带有松鼠的颜色(由于专利文件中不能存在色彩所以不能示出),而人眼从B2中获得不到任何信息。原始图像2中的对象是一个船,B3为对从原始图像2中提取出的鲁棒表示进行可视化处理后得到,B4为对从原始图像2中提取出的非鲁棒表示进行可视化处理后得到,通过人眼观看图5,B3中隐约有船的形状,B3中还携带有船的颜色(由于专利文件中不能存在色彩所以不能示出),而人眼从B4中获得不到任何信息。也即鲁棒表示中包括的特征信息与人眼利用的特征相似,相反,非鲁棒表示中包括的特征信息不能被人眼理解,对人眼来说非鲁棒表示是噪声。应理解,图5中的示例仅为方便理解鲁棒表示和非鲁棒表示的概念,不用于限定本方案。In order to understand the robust representation and the non-robust representation more intuitively, please refer to FIG. 5. FIG. 5 is a schematic diagram of the robust representation and the non-robust representation after visualization processing in the neural network training method provided by the embodiment of the application. . Among them, B1 and B2 correspond to the same original image (original image 1), B3 and B4 correspond to the same original image (original image 2), the object in original image 1 is a squirrel, and B1 is the pair extracted from original image 1. The robust representation of is obtained after visualization processing, and B2 is obtained after visualization processing of the non-robust representation extracted from the original image 1. As shown in Figure 5 by human eyes, there is a faint squirrel shape in B1, and B1 also carries There is the color of a squirrel (it cannot be shown because there is no color in the patent document), but the human eye cannot get any information from B2. The object in the original image 2 is a ship, B3 is obtained by visualizing the robust representation extracted from the original image 2, and B4 is obtained by visualizing the non-robust representation extracted from the original image 2. , Viewing Figure 5 through the human eye, B3 has the shape of a ship faintly, and B3 also carries the color of the ship (the color cannot be shown in the patent document), and the human eye cannot obtain any information from B4. That is, the feature information included in the robust representation is similar to the feature used by the human eye. On the contrary, the feature information included in the non-robust representation cannot be understood by the human eye, and the non-robust representation is noise to the human eye. It should be understood that the example in FIG. 5 is only to facilitate the understanding of the concepts of robust representation and non-robust representation, and is not used to limit this solution.
需要说明的是,本申请实施例不限定步骤302和303的执行顺序,可以为先执行步骤302,再执行步骤303;也可以先执行步骤303,再执行步骤302;还可以同时执行步骤302和303。It should be noted that the embodiment of the present application does not limit the execution order of steps 302 and 303. Step 302 can be executed first, and then step 303; or step 303 can be executed first, and then step 302 can be executed; steps 302 and 302 can also be executed simultaneously. 303.
304、训练设备将第二鲁棒表示和第二非鲁棒表示组合,得到组合后的第一表示。304. The training device combines the second robust representation and the second non-robust representation to obtain the combined first representation.
本申请实施例中,在一种情况下,训练设备在得到第二鲁棒表示和第二非鲁棒表示之后,会将第二鲁棒表示和第二非鲁棒表示组合,得到组合后的第一表示。其中,组合的方式包括但不限于拼接(conact)、相加(add)、融合(fusion)和相乘等。In the embodiment of the present application, in one case, after obtaining the second robust representation and the second non-robust representation, the training device will combine the second robust representation and the second non-robust representation to obtain the combined The first representation. Among them, the way of combination includes, but is not limited to, conact, addition, fusion, and multiplication.
305、训练设备将组合后的第一表示输入分类网络,以通过分类网络根据组合后的第一表示执行分类操作,得到分类网络输出的第三分类类别。305. The training device inputs the combined first representation into the classification network to perform a classification operation according to the combined first representation through the classification network to obtain a third classification category output by the classification network.
本申请实施例中,训练设备在得到组合后的第一表示之后,将组合后的第一表示输入分类网络,以通过分类网络根据组合后的第一表示执行分类操作,得到分类网络输出的第 三分类类别。步骤304和305的处理方式也可以称为通过标准路径生成第三分类类别。其中,分类网络中可以包括至少一个感知机,前述感知机包括至少两个神经网络层,具体可以为双层全连接感知机。第三分类类别指示原始图像中对象的类别。In the embodiment of the present application, after obtaining the combined first representation, the training device inputs the combined first representation into the classification network to perform the classification operation according to the combined first representation through the classification network to obtain the first representation output by the classification network. Three classification categories. The processing method of steps 304 and 305 may also be referred to as generating a third classification category through a standard path. Wherein, the classification network may include at least one perceptron, and the aforementioned perceptron includes at least two neural network layers, and specifically may be a double-layer fully connected perceptron. The third classification category indicates the category of the object in the original image.
306、训练设备将第二鲁棒表示输入分类网络,以通过分类网络根据第二鲁棒表示执行分类操作,得到分类网络输出的第四分类类别。306. The training device inputs the second robust representation into the classification network, so as to perform a classification operation according to the second robust representation through the classification network to obtain a fourth classification category output by the classification network.
本申请实施例中,在另一种情况下,训练设备也可以将第二鲁棒表示与一个第一常数张量(比如全为0的向量)组合,得到组合后的第三表示,将组合后的第三表示输入分类网络中,以通过分类网络根据组合后的第三表示执行分类操作,得到分类网络输出的第四分类类别。具体的,由于组合后的第三表示中第一常数张量的部分不会发生变化,其中不包括自然图像的特征信息,从而分类网络可以利用组合后的第三表示中的第二鲁棒表示包括的特征信息,执行分类操作,输出第四分类类别,第四分类类别为自然图像中对象的分类类别。步骤306的处理方式也可以称为通过鲁棒路径生成第四分类类别。进一步地,由于步骤306与步骤305中使用的可以为同一个分类网络,为了使得组合后的第三表示与组合后的第一表示格式一致,所以将第二鲁棒表示与第一常数张量进行组合。若步骤306与步骤305中使用的是不同的分类网络,也可以直接将第二鲁棒表示输入至分类网络中。In the embodiment of the present application, in another case, the training device may also combine the second robust representation with a first constant tensor (such as a vector with all zeros) to obtain the combined third representation, and combine The latter third representation is input into the classification network, and the classification operation is performed according to the combined third representation through the classification network to obtain the fourth classification category output by the classification network. Specifically, since the part of the first constant tensor in the combined third representation will not change, which does not include the feature information of the natural image, the classification network can use the second robust representation in the combined third representation Include feature information, perform a classification operation, and output a fourth classification category, which is the classification category of the object in the natural image. The processing method of step 306 may also be referred to as generating a fourth classification category through a robust path. Further, since the same classification network can be used in step 306 and step 305, in order to make the combined third representation consistent with the combined first representation format, the second robust representation and the first constant tensor Make a combination. If different classification networks are used in step 306 and step 305, the second robust representation can also be directly input into the classification network.
其中,组合的具体实现方式可以参阅步骤304中的介绍,分类网络的具体表现形式可以参阅步骤305中的介绍,此处不做赘述。步骤306中可以与步骤305中使用同一个分类网络,也可以为分别使用不同的分类网络。组合后的第三表示的格式与组合后的第二表示的格式可以相同。第一常数张量指的是在多次训练中数值均保持不变的张量,具体可以表现为一维数据、二维数据、三维数据或更高维数据等等,一个第一常数张量中的所有常数的取值可以相同,也可不同。作为示例,例如一个第一常数张量中所有常数的取值可以均为0、1、2或其他数值等等,作为另一示例,例如一个第一常数张量中可以包括1、2、3、5、12、18等不同的数值等等,该第一常数张量的长度与第二非鲁棒表示的长度相同。第二鲁棒表示和第一常数张量的前后位置可以与第二鲁棒表示和第二非鲁棒表示的前后位置对应。若步骤304中第二鲁棒表示在前,第二非鲁棒表示在后,则步骤306中第二鲁棒表示在前,第一常数张量在后;若步骤304中第二鲁棒非表示在前,第二鲁棒表示在后,则步骤306中第一常数张量在前,第二鲁棒表示在后。Among them, the specific implementation of the combination can be referred to the introduction in step 304, and the specific manifestation of the classification network can be referred to the introduction in step 305, which will not be repeated here. In step 306, the same classification network may be used as in step 305, or different classification networks may be used respectively. The format of the combined third representation and the format of the combined second representation may be the same. The first constant tensor refers to a tensor whose values remain unchanged in multiple trainings. It can be expressed as one-dimensional data, two-dimensional data, three-dimensional data, or higher-dimensional data, etc., in a first constant tensor The value of all constants can be the same or different. As an example, for example, the values of all constants in a first constant tensor can be 0, 1, 2, or other values, etc. As another example, for example, a first constant tensor can include 1, 2, 3, 5, 12 , 18, etc., the length of the first constant tensor is the same as the length of the second non-robust representation. The front and rear positions of the second robust representation and the first constant tensor may correspond to the front and rear positions of the second robust representation and the second non-robust representation. If in step 304 the second robust means first and the second non-robust means behind, then in step 306 the second robust means first and the first constant tensor is behind; if the second robust non-robust means behind in step 304 If the representation is in front and the second robust representation is in the back, then in step 306 the first constant tensor is in the front and the second robust representation is in the back.
307、训练设备将第二非鲁棒表示输入分类网络,以通过分类网络根据第二非鲁棒表示执行分类操作,得到分类网络输出的第五分类类别。307. The training device inputs the second non-robust representation into the classification network, so as to perform a classification operation according to the second non-robust representation through the classification network to obtain a fifth classification category output by the classification network.
本申请实施例中,在又一种情况下,训练设备可以将第二非鲁棒表示与一个第二常数张量组合,得到组合后的第四表示,将组合后的第四表示输入分类网络,以通过分类网络根据组合后的第四表示执行分类操作,得到分类网络输出的第五分类类别。具体的,与步骤306类似,分类网络利用组合后第四表示中第二非鲁棒表示包括的特征信息,执行分类操作,输出第五分类类别,第五分类类别为自然图像中对象的分类类别。步骤307的处理方式也可以称为通过非鲁棒路径生成第五分类类别。In the embodiment of the present application, in another case, the training device may combine the second non-robust representation with a second constant tensor to obtain the combined fourth representation, and input the combined fourth representation into the classification network , To perform the classification operation according to the combined fourth representation through the classification network to obtain the fifth classification category output by the classification network. Specifically, similar to step 306, the classification network uses the feature information included in the second non-robust representation in the combined fourth representation to perform a classification operation to output a fifth classification category, which is the classification category of the object in the natural image . The processing method of step 307 may also be referred to as generating a fifth classification category through a non-robust path.
其中,组合的具体实现方式可以参阅步骤304中的介绍,分类网络的具体表现形式可以参阅步骤305中的介绍,此处不做赘述。步骤307中可以与步骤305和306中使用同一 个分类网络,也可以为分别使用不同的分类网络。若步骤307与步骤305和306中使用同一个分类网络,为了保证分类网络在数据处理过程的一致性,需要将第二非鲁棒表示与第二常数张量组合,组合后的第四表示的格式与组合后的第一表示的格式需要保持一致。第二常数张量的含义和具体表现形式均可以参阅对第一常数张量的描述,第二常数张量可以与第一常数张量为相同的常数张量,也可以为不同的常数张量,此次不做限定。第二非鲁棒表示和第二常数张量的前后位置可以与第二鲁棒表示和第二非鲁棒表示的前后位置对应。若步骤304中第二鲁棒表示在前,第二非鲁棒表示在后,则步骤306中第二常数张量在前,第二非鲁棒表示在后;若步骤304中第二鲁棒非表示在前,第二鲁棒表示在后,则步骤306中第二非鲁棒表示在前,第二常数张量在后。Among them, the specific implementation of the combination can be referred to the introduction in step 304, and the specific manifestation of the classification network can be referred to the introduction in step 305, which will not be repeated here. In step 307, the same classification network can be used as in steps 305 and 306, or different classification networks can be used respectively. If the same classification network is used in step 307 and steps 305 and 306, in order to ensure the consistency of the classification network in the data processing process, the second non-robust representation needs to be combined with the second constant tensor, and the combined fourth representation is The format must be consistent with the format of the combined first representation. For the meaning and specific expression of the second constant tensor, please refer to the description of the first constant tensor. The second constant tensor can be the same constant tensor as the first constant tensor, or it can be a different constant tensor. , This time there is no limitation. The front and rear positions of the second non-robust representation and the second constant tensor may correspond to the front and rear positions of the second robust representation and the second non-robust representation. If in step 304 the second robust representation is in front and the second non-robust representation is in the back, then in step 306 the second constant tensor is in front and the second non-robust representation is in the back; if the second robust representation is in step 304 If the non-representation is in front and the second robust is in the back, then in step 306, the second non-robust representation is in the front and the second constant tensor is in the back.
需要说明的是,本申请实施例不限定步骤304和305、步骤306以及步骤307之间的执行顺序,可以为先执行步骤304和305,再执行步骤306,再执行步骤307;也可以先执行步骤306,再执行步骤304和305,再执行步骤307;也可以先执行步骤306,再执行步骤307,再执行步骤304和305,也可以先执行步骤304和305,再执行步骤307,再执行步骤306等,步骤304和305、步骤306以及步骤307之间的先后顺序可以任意排列,此处不做穷举,此外,还可以同时执行步骤304和305、步骤306以及步骤307。It should be noted that the embodiment of the present application does not limit the execution order between steps 304 and 305, step 306, and step 307. Steps 304 and 305 can be performed first, then step 306, and then step 307; or first Step 306, perform steps 304 and 305, and then perform step 307; you can also perform step 306, then perform step 307, and then perform steps 304 and 305, or you can perform steps 304 and 305, then perform step 307, and then perform Step 306, etc., the sequence of steps 304 and 305, step 306, and step 307 can be arranged arbitrarily, which is not exhaustive here. In addition, steps 304 and 305, step 306, and step 307 can also be performed at the same time.
308、训练设备获取对抗图像和第一标注类别。308. The training device obtains the confrontation image and the first label category.
本申请实施例中,训练设备获取对抗图像和与该对抗图像对应的第一标注类别。其中,对抗图像为进行过扰动处理的图像,也可以称为扰动后图像,对于扰动的具体含义可以参阅步骤301中的描述,此处不做赘述。第一标注类别为与对抗图像对应的正确类别,为对对抗图像中一个或多个对象的正确分类,其中可以包括一个或多个分类类别,是用来做训练阶段的监督数据的。第一标注类别与上述第三标注类别的含义类似,区别在于第一标注类别是针对对抗图像的,第三标注类别是针对自然图像的,对于第三标注类别的例子可以参阅步骤301中对第一标注类别的举例,此次不做赘述。In the embodiment of the present application, the training device obtains the confrontation image and the first label category corresponding to the confrontation image. The counter image is an image that has undergone disturbance processing, and may also be referred to as a post-disturbance image. For the specific meaning of the disturbance, refer to the description in step 301, which will not be repeated here. The first label category is the correct category corresponding to the confrontation image, which is the correct classification of one or more objects in the confrontation image, which may include one or more classification categories, and is used for supervising data in the training phase. The first annotation category has a similar meaning to the above-mentioned third annotation category. The difference is that the first annotation category is for adversarial images, and the third annotation category is for natural images. For examples of the third annotation category, please refer to step 301. An example of labeling categories will not be repeated this time.
具体的,训练设备可以在上述自然图像的基础上,进行扰动处理,得到对抗图像。在一种实现方式中,训练设备可以为在每次训练过程中,基于上述标准路径、鲁棒路径或非鲁棒路径的梯度获得前述扰动,进而得到扰动图像。在另一种实现方式中,训练设备也可以不依赖前述梯度获得前述扰动。而上述两种情况下对抗图像的生成方式有所不同,以下分别对前述两种实现方式进行描述。Specifically, the training device may perform perturbation processing on the basis of the above-mentioned natural image to obtain a counter image. In an implementation manner, the training device may obtain the aforementioned disturbance based on the gradient of the standard path, the robust path, or the non-robust path in each training process, and then obtain the disturbance image. In another implementation manner, the training device may also not rely on the aforementioned gradient to obtain the aforementioned perturbation. However, in the above two cases, the method of generating the confrontation image is different, and the foregoing two implementation methods are respectively described below.
(1)对抗图像是基于梯度生成的(1) The confrontation image is generated based on gradient
进一步地,上述梯度可以分为标准路径的第一梯度、鲁棒路径的第二梯度和非鲁棒路径的第三梯度,以下分别进行介绍。Further, the above-mentioned gradient can be divided into the first gradient of the standard path, the second gradient of the robust path, and the third gradient of the non-robust path, which will be introduced separately below.
A、基于标准路径的第一梯度获得扰动A. Obtain the disturbance based on the first gradient of the standard path
本实施例中,步骤308可以包括:训练设备根据第二损失函数的函数值,生成第一梯度,根据第一梯度对原始图像进行扰动处理,以生成对抗图像,将第三标注类别确定为第一标注类别。其中,第二损失函数用于表示第三分类类别与第三标注类别之间的相似度,第二损失函数具体可以为交叉熵损失函数、最大间隔损失函数(max-margin loss)或其他类型的损失函数等,此次不做限定。本申请实施例中,根据第三分类类别与第三标注类别之 间的相似度,生成第一梯度,并根据第一梯度对原始图像进行扰动,使得扰动处理更具有针对性,有利于加快第一特征提取网络和第二特征提取网络的训练过程,提高训练过程的效率。In this embodiment, step 308 may include: the training device generates a first gradient according to the function value of the second loss function, performs perturbation processing on the original image according to the first gradient to generate a confrontation image, and determines the third label category as the first One label category. Among them, the second loss function is used to indicate the similarity between the third classification category and the third label category, and the second loss function can specifically be a cross-entropy loss function, a maximum margin loss function (max-margin loss) or other types of Loss functions, etc., are not limited this time. In the embodiment of the present application, the first gradient is generated according to the similarity between the third classification category and the third annotation category, and the original image is disturbed according to the first gradient, so that the disturbance processing is more targeted, which is beneficial to speed up the first gradient. The training process of the first feature extraction network and the second feature extraction network improves the efficiency of the training process.
具体的,在一种情况下,训练设备可以根据第三分类类别和第三标注类别,生成第二损失函数的函数值,并根据第二损失函数的函数值,生成第一梯度,将第一梯度带入到预设函数中,之后乘以预设系数,以得到扰动,再将得到的扰动与原始图像叠加,以生成对抗图像。预设函数可以为sign函数、恒同变换(identity)函数或其他函数等;预设系数的取值可以为0.008、0.007、0.006、0.005或其他系数值等,具体预设函数的选择和预设系数的取值均可以结合实际应用环境确定,此处不做限定。Specifically, in one case, the training device may generate the function value of the second loss function according to the third classification category and the third label category, and generate the first gradient according to the function value of the second loss function, and convert the first The gradient is brought into the preset function, and then multiplied by the preset coefficient to obtain the perturbation, and then the obtained perturbation is superimposed with the original image to generate a confrontation image. The preset function can be a sign function, an identity function or other functions; the value of the preset coefficient can be 0.008, 0.007, 0.006, 0.005 or other coefficient values, etc., the selection and preset of the specific preset function The value of the coefficient can be determined in combination with the actual application environment, and it is not limited here.
为进一步理解本方案,示出了生成扰动的一种表达式:
Figure PCTCN2021081238-appb-000002
其中,J(θ,x,y)代表第二损失函数,θ代表第一特征提取网络和第二特征提取网络中各个神经网络层的权重构成的集合,x代表输入到第一特征提取网络和第二特征提取网络中的自然图像,y代表第三标注类别,
Figure PCTCN2021081238-appb-000003
代表对第二损失函数求导,sign代表预设函数为sign函数,此处以预设系数的取值为0.007,应理解,此处对公式的举例仅为方便理解本方案,不用于限定本方案,预设函数和预设系数也可以进行替换。
To further understand this scheme, an expression for generating disturbance is shown:
Figure PCTCN2021081238-appb-000002
Among them, J(θ, x, y) represents the second loss function, θ represents the set of weights of each neural network layer in the first feature extraction network and the second feature extraction network, and x represents the input to the first feature extraction network and The second feature extracts natural images in the network, y represents the third label category,
Figure PCTCN2021081238-appb-000003
Represents the derivation of the second loss function, sign represents the preset function is the sign function, here the preset coefficient value is 0.007, it should be understood that the example of the formula here is only to facilitate the understanding of the solution, and is not used to limit the solution , The preset function and preset coefficient can also be replaced.
在另一种情况下,训练设备还可以根据第三分类类别和第三标注类别,生成第二损失函数的函数值,并根据第二损失函数的函数值,生成第一梯度,将第一梯度与预设系数相乘,得到扰动,再将得到的扰动与原始图像叠加,以生成对抗图像。In another case, the training device can also generate the function value of the second loss function according to the third classification category and the third label category, and generate the first gradient according to the function value of the second loss function, and the first gradient Multiply the preset coefficient to obtain the disturbance, and then superimpose the obtained disturbance with the original image to generate a confrontation image.
B、基于鲁棒路径的第二梯度获得扰动B. Obtain the disturbance based on the second gradient of the robust path
本实施例中,步骤308可以包括:训练设备根据第三损失函数的函数值,生成第二梯度;根据第二梯度对原始图像进行扰动处理,以生成对抗图像,将第三标注类别确定为第一标注类别。其中,第三损失函数用于表示第四分类类别与第三标注类别之间的相似度,第三损失函数的类型与第二损失函数的类型相似,此处不再举例。第二梯度与第一梯度的区别在于,第二梯度是对第三损失函数的函数值进行梯度求导得到的,第一梯度是对第二损失函数的函数值进行梯度求导得到的。进一步地,结合上述公式进行解释,在利用第一梯度生成扰动的情况下,上述公式中的J(θ,x,y)代表第二损失函数;在利用第二梯度生成扰动的情况下,上述公式中的J(θ,x,y)代表第三损失函数。本申请实施例中,根据分类网络根据第二鲁棒表示输出的第四分类类别与第三标注类别之间的相似度,对原始图像进行扰动,使得扰动处理与第一特征提取网络之间更具有针对性,有利于提高第一特征提取网络对鲁棒表示的特征提取能力。In this embodiment, step 308 may include: the training device generates a second gradient according to the function value of the third loss function; perturbs the original image according to the second gradient to generate a confrontation image, and determines the third label category as the first One label category. Among them, the third loss function is used to indicate the similarity between the fourth classification category and the third label category, and the type of the third loss function is similar to the type of the second loss function, and no further examples are given here. The difference between the second gradient and the first gradient is that the second gradient is obtained by performing gradient derivation on the function value of the third loss function, and the first gradient is obtained by performing gradient derivation on the function value of the second loss function. Further, combined with the above formula to explain, in the case of using the first gradient to generate the disturbance, J(θ, x, y) in the above formula represents the second loss function; in the case of using the second gradient to generate the disturbance, the above J(θ, x, y) in the formula represents the third loss function. In the embodiment of the present application, the original image is disturbed according to the similarity between the fourth classification category output by the second robust representation and the third annotation category according to the classification network, so that the disturbance processing and the first feature extraction network are more closely related. It is pertinent and helps to improve the feature extraction capability of the first feature extraction network for robust representation.
具体的,在一种情况下,训练设备可以根据第四分类类别和第三标注类别,生成第三损失函数的函数值,并根据第三损失函数的函数值,生成第二梯度,将第二梯度带入到预设函数中,之后乘以预设系数,以得到扰动,再将得到的扰动与原始图像叠加,以生成对抗图像。预设函数以及预设系数的具体实现方式可以参阅上述A种情况中的描述,此处不做赘述。在另一种情况下,训练设备还可以根据第四分类类别和第三标注类别,生成第三损失函数的函数值,并根据第二损失函数的函数值,生成第二梯度,将第二梯度与预设系数相乘,得到扰动,再将得到的扰动与原始图像叠加,以生成对抗图像。Specifically, in one case, the training device may generate the function value of the third loss function according to the fourth classification category and the third label category, and generate the second gradient according to the function value of the third loss function, and convert the second The gradient is brought into the preset function, and then multiplied by the preset coefficient to obtain the perturbation, and then the obtained perturbation is superimposed with the original image to generate a confrontation image. For the specific implementation of the preset function and the preset coefficient, please refer to the description in the above case A, which will not be repeated here. In another case, the training device can also generate the function value of the third loss function according to the fourth classification category and the third label category, and generate the second gradient according to the function value of the second loss function, and the second gradient Multiply the preset coefficient to obtain the disturbance, and then superimpose the obtained disturbance with the original image to generate a confrontation image.
C、基于非鲁棒路径的第三梯度获得扰动C. Obtain the disturbance based on the third gradient of the non-robust path
本实施例中,步骤308可以包括:训练设备根据第四损失函数的函数值,生成第三梯度;根据第三梯度对原始图像进行扰动处理,以生成对抗图像,将第三标注类别确定为第一标注类别。其中,第四损失函数用于表示第五分类类别与第三标注类别之间的相似度,第四损失函数的类型与第二损失函数的类型相似,此处不再举例。第三梯度与第一梯度的区别在于,第三梯度是对第四损失函数的函数值进行梯度求导得到的,第一梯度是对第二损失函数的函数值进行梯度求导得到的。进一步地,结合上述公式进行解释,在利用第一梯度生成扰动的情况下,上述公式中的J(θ,x,y)代表第二损失函数;在利用第三梯度生成扰动的情况下,上述公式中的J(θ,x,y)代表第四损失函数。本申请实施例中,根据分类网络根据第二非鲁棒表示输出的第五分类类别与第三标注类别之间的相似度,对原始图像进行扰动,使得扰动处理与第二特征提取网络之间更具有针对性,有利于提高第一特征提取网络对非鲁棒表示的特征提取能力。In this embodiment, step 308 may include: the training device generates a third gradient according to the function value of the fourth loss function; perturbs the original image according to the third gradient to generate a confrontation image, and determines the third label category as the first One label category. Among them, the fourth loss function is used to indicate the similarity between the fifth classification category and the third label category, and the type of the fourth loss function is similar to the type of the second loss function, and no further examples are given here. The difference between the third gradient and the first gradient is that the third gradient is obtained by performing gradient derivation on the function value of the fourth loss function, and the first gradient is obtained by performing gradient derivation on the function value of the second loss function. Further, combined with the above formula to explain, in the case of using the first gradient to generate the disturbance, J(θ, x, y) in the above formula represents the second loss function; in the case of using the third gradient to generate the disturbance, the above J(θ, x, y) in the formula represents the fourth loss function. In the embodiment of the present application, the original image is disturbed according to the similarity between the fifth classification category output by the second non-robust representation and the third label category according to the classification network, so that the disturbance processing and the second feature extraction network are different from each other. It is more pertinent and helps to improve the feature extraction ability of the first feature extraction network for non-robust representations.
具体的,在一种情况下,训练设备可以根据第五分类类别和第三标注类别,生成第四损失函数的函数值,并根据第四损失函数的函数值,生成第三梯度,将第三梯度带入到预设函数中,之后乘以预设系数,以得到扰动,再将得到的扰动与原始图像叠加,以生成对抗图像。预设函数以及预设系数的具体实现方式可以参阅上述A种情况中的描述,此处不做赘述。在另一种情况下,训练设备还可以根据第五分类类别和第三标注类别,生成第四损失函数的函数值,并根据第四损失函数的函数值,生成第三梯度,将第三梯度与预设系数相乘,得到扰动,再将得到的扰动与原始图像叠加,以生成对抗图像。Specifically, in one case, the training device can generate the function value of the fourth loss function according to the fifth classification category and the third label category, and generate the third gradient according to the function value of the fourth loss function, and the third The gradient is brought into the preset function, and then multiplied by the preset coefficient to obtain the perturbation, and then the obtained perturbation is superimposed with the original image to generate a confrontation image. For the specific implementation of the preset function and the preset coefficient, please refer to the description in the above case A, which will not be repeated here. In another case, the training device may also generate the function value of the fourth loss function according to the fifth classification category and the third label category, and generate the third gradient according to the function value of the fourth loss function, and the third gradient Multiply the preset coefficient to obtain the disturbance, and then superimpose the obtained disturbance with the original image to generate a confrontation image.
需要说明的是,本申请实施例中步骤301至307为可选步骤,但若对抗图像是基于上述标准路径、鲁棒路径或非鲁棒路径的梯度获得的,则步骤301至307为必选步骤,且步骤301至307的执行顺序在步骤308之前。It should be noted that steps 301 to 307 in the embodiment of this application are optional steps, but if the confrontation image is obtained based on the gradient of the aforementioned standard path, robust path or non-robust path, steps 301 to 307 are required. Steps, and the execution order of steps 301 to 307 is before step 308.
(2)对抗图像不依赖于梯度生成(2) The adversarial image does not depend on gradient generation
本实施例中,训练设备上配置的训练数据集中可以预先配置有对抗图像和与该对抗图像对应的第一标注类别,步骤308可以包括:训练设备从训练数据集中获取对抗图像和与该对抗图像对应的第一标注类别。In this embodiment, the training data set configured on the training device may be pre-configured with a confrontation image and a first label category corresponding to the confrontation image. Step 308 may include: the training device obtains the confrontation image and the confrontation image from the training data set. The corresponding first label category.
进一步地,对于训练数据集中对抗图像的生成方式,训练设备在获取到自然图像之后,根据与自然图像对应的二维矩阵的大小,生成二维矩阵形式的扰动矩阵,扰动矩阵中每个参数的取值满足步骤301中的约束,前述扰动矩阵中每个参数的值可以为随机生成的,也可以为在步骤301的约束范围内按照从小到大的顺序生成扰动矩阵,也可以为在步骤301的约束范围内按照从大到小的顺序生成扰动矩阵,或者还可以按照其他规律来生成扰动矩阵等等,此处不做限定。Further, for the generation method of the confrontation image in the training data set, after the training device obtains the natural image, it generates a disturbance matrix in the form of a two-dimensional matrix according to the size of the two-dimensional matrix corresponding to the natural image. The value satisfies the constraint in step 301. The value of each parameter in the aforementioned disturbance matrix can be randomly generated, or the disturbance matrix can be generated in the order from small to large within the constraint range of step 301, or it can be generated in step 301. Within the constraint range of, the disturbance matrix is generated according to the order from large to small, or the disturbance matrix can also be generated according to other laws, etc., which is not limited here.
309、训练设备将对抗图像输入第一特征提取网络,得到第一特征提取网络生成的第一鲁棒表示。309. The training device inputs the confrontation image into the first feature extraction network to obtain a first robust representation generated by the first feature extraction network.
本申请实施例中,训练设备在获取到对抗图像之后,将对抗图像输入第一特征提取网络,得到第一特征提取网络生成的第一鲁棒表示。其中,鲁棒表示的概念已在步骤302中进行了介绍,此处不做赘述。第一鲁棒表示与第二鲁棒表示的区别在于,第二鲁棒表示为 是从原始图像中提取出的特征信息,第一鲁棒表示为从对抗图像中提取出的特征信息。In the embodiment of the present application, after acquiring the confrontation image, the training device inputs the confrontation image into the first feature extraction network to obtain the first robust representation generated by the first feature extraction network. Among them, the concept of robust representation has been introduced in step 302, and will not be repeated here. The difference between the first robust representation and the second robust representation is that the second robust representation is the feature information extracted from the original image, and the first robust representation is the feature information extracted from the confrontation image.
310、训练设备将对抗图像输入第二特征提取网络,得到第二特征提取网络生成的第一非鲁棒表示。310. The training device inputs the confrontation image into the second feature extraction network to obtain the first non-robust representation generated by the second feature extraction network.
本申请实施例中,训练设备在获取到对抗图像之后,将对抗图像输入第二特征提取网络,得到第二特征提取网络生成的第一非鲁棒表示。其中,非鲁棒表示的概念已在步骤303中进行了介绍,此处不做赘述。第一非鲁棒表示与第二非鲁棒表示的区别在于,第二非鲁棒表示为是从原始图像中提取出的特征信息,第一非鲁棒表示为从对抗图像中提取出的特征信息。In the embodiment of the present application, after acquiring the confrontation image, the training device inputs the confrontation image into the second feature extraction network to obtain the first non-robust representation generated by the second feature extraction network. Among them, the concept of non-robust representation has been introduced in step 303, and will not be repeated here. The difference between the first non-robust representation and the second non-robust representation is that the second non-robust representation is the feature information extracted from the original image, and the first non-robust representation is the feature extracted from the confrontation image. information.
需要说明的是,本申请实施例不限定步骤309和步骤310之间的执行顺序,可以先执行步骤309,再执行步骤310;也可以先执行步骤310,再执行步骤309;也可以同时执行步骤309和310。It should be noted that the embodiment of the present application does not limit the execution sequence between step 309 and step 310. Step 309 can be performed first, and then step 310; step 310 can be performed first, and step 309 can be performed again; and steps can also be performed at the same time. 309 and 310.
311、训练设备将第一鲁棒表示和第一非鲁棒表示组合,得到组合后的第二表示。311. The training device combines the first robust representation and the first non-robust representation to obtain a combined second representation.
本申请实施例中,训练设备在得到第一鲁棒表示和第一非鲁棒表示之后,将第一鲁棒表示和第一非鲁棒表示组合,得到组合后的第二表示。组合的方式可以参阅步骤304中的描述,此处不做赘述。In the embodiment of the present application, after obtaining the first robust representation and the first non-robust representation, the training device combines the first robust representation and the first non-robust representation to obtain the combined second representation. For the way of combination, please refer to the description in step 304, which will not be repeated here.
312、训练设备将组合后的第二表示输入分类网络,得到分类网络输出的第六分类类别。312. The training device inputs the combined second representation into the classification network to obtain a sixth classification category output by the classification network.
本申请实施例中,训练设备在得到组合后的第二表示之后,将组合后的第二表示输入分类网络,得到分类网络输出的第六分类类别。其中,分类网络的具体表现形式可以参阅步骤305中的描述,步骤312中使用的分类网络与步骤305中使用的分类网络可以为同一个分类网络,也可以为不同的分类网络。第六分类类别的含义与第三分类类别的含义类似,区别在于,第六分类类别指示对抗图像中对象的类别。In the embodiment of the present application, after obtaining the combined second representation, the training device inputs the combined second representation into the classification network to obtain the sixth classification category output by the classification network. For the specific manifestation of the classification network, please refer to the description in step 305. The classification network used in step 312 and the classification network used in step 305 may be the same classification network or different classification networks. The meaning of the sixth classification category is similar to that of the third classification category, except that the sixth classification category indicates the category of the object in the confrontation image.
313、训练设备判断第六分类类别与第一标注类别是否相同,若不相同,则进入步骤314,若相同,则进入步骤316。313. The training device judges whether the sixth classification category is the same as the first label category, if they are not the same, go to step 314, and if they are the same, go to step 316.
本申请实施例中,训练设备在得到分类网络输出的第六分类类别之后,判断第六分类类别与第一标注类别是否相同,也即判断分类网络输出的第六分类类别是否为与对抗图像对应的正确分类类别。若不相同,则进入步骤314,若相同,则进入步骤316。In the embodiment of the present application, after obtaining the sixth classification category output by the classification network, the training device determines whether the sixth classification category is the same as the first label category, that is, whether the sixth classification category output by the classification network corresponds to the confrontation image The correct classification category. If they are not the same, go to step 314, and if they are the same, go to step 316.
可选地,在第一六分类类别与第一标注类别不同的情况下,训练设备将第六分类类别确定为第二标注类别。其中,第二标注类别指的是与对抗图像对应的错误类别,为对抗图像中一个或多个对象的错误分类,其中可以包括一个或多个分类类别,也是用来做训练阶段的监督数据的。第二标注类别的含义与第一标注类别的含义类似,区别在于第二标注类别是与对抗图像对应的错误分类,第一标注类别是与对抗图像对应的正确分类。Optionally, in a case where the first six classification categories are different from the first annotation category, the training device determines the sixth classification category as the second annotation category. Among them, the second label category refers to the error category corresponding to the confrontation image, which is the misclassification of one or more objects in the confrontation image, which can include one or more classification categories, and is also used for the supervision data in the training phase. . The meaning of the second annotation category is similar to the meaning of the first annotation category, except that the second annotation category is a misclassification corresponding to the confrontation image, and the first annotation category is a correct classification corresponding to the confrontation image.
本申请实施例中,提供了第二标注类别的一种获取方式,操作简单,且不需要增加额外步骤,节省了计算机资源。In the embodiment of the present application, a method for obtaining the second label category is provided, which is simple to operate and does not require additional steps, which saves computer resources.
314、训练设备将第一鲁棒表示输入分类网络,得到分类网络输出的第一分类类别。314. The training device inputs the first robust representation into the classification network to obtain the first classification category output by the classification network.
本申请实施例中,训练设备将第一鲁棒表示输入到分类网络中,得到分类网络输出的第一分类类别。其中,分类网络的含义可以参阅上述步骤305中的描述,步骤314可以与步骤305中使用同一分类网络,也可以使用不同的分类网络。第一分类类别为对对抗图像 中对象的分类类别。In the embodiment of the present application, the training device inputs the first robust representation into the classification network to obtain the first classification category output by the classification network. The meaning of the classification network can refer to the description in step 305 above, and step 314 can use the same classification network as in step 305, or a different classification network can be used. The first classification category is the classification category of the object in the confrontation image.
315、训练设备将第一非鲁棒表示输入分类网络,得到分类网络输出的第二分类类别。315. The training device inputs the first non-robust representation into the classification network to obtain a second classification category output by the classification network.
本申请实施例中,训练设备将第一非鲁棒表示输入到分类网络中,得到分类网络输出的第二分类类别。其中,分类网络的含义可以参阅上述步骤305中的描述,步骤315可以与步骤305中使用同一分类网络,也可以使用不同的分类网络。第二分类类别为对对抗图像中对象的分类类别。In the embodiment of the present application, the training device inputs the first non-robust representation into the classification network to obtain the second classification category output by the classification network. The meaning of the classification network can refer to the description in step 305 above, and step 315 can use the same classification network as in step 305, or a different classification network can be used. The second classification category is the classification category of the object in the confrontation image.
316、训练设备根据损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件。316. The training device performs iterative training on the first feature extraction network and the second feature extraction network according to the loss function until the convergence condition is met.
本申请实施例中,训练设备可以根据损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件。具体的,训练设备根据损失函数的函数值,生成梯度值,并利用梯度值进行反向传播,以更新第一特征提取网络和第二特征提取网络的神经元权重,以实现对第一特征提取网络和第二特征提取网络的一次训练。In the embodiment of the present application, the training device may perform iterative training on the first feature extraction network and the second feature extraction network according to the loss function until the convergence condition is satisfied. Specifically, the training device generates a gradient value according to the function value of the loss function, and uses the gradient value to perform back propagation to update the neuron weights of the first feature extraction network and the second feature extraction network, so as to realize the extraction of the first feature A training of the network and the second feature extraction network.
其中,收敛条件可以为满足损失函数的收敛条件,也可以为迭代次数满足预设次数等。损失函数用于指示分类类别与标注类别之间的相似度,分类类别与标注类别之间的相似度也可以理解为分类类别与标注类别之间的差别。由于步骤301至307以及步骤313均为可选步骤,且步骤301至步骤307中可以全部执行或不执行,也可以部分执行且部分不执行;若执行步骤313,则训练设备可以为通过步骤315进入步骤316,也可以为通过步骤313进入步骤316,而前述多种情况下损失函数的具体含义有所不总。以下对前述多种情况分别进行描述。Wherein, the convergence condition may be that the convergence condition of the loss function is satisfied, or the number of iterations meets the preset number, etc. The loss function is used to indicate the similarity between the classification category and the label category, and the similarity between the classification category and the label category can also be understood as the difference between the classification category and the label category. Since steps 301 to 307 and step 313 are optional steps, and steps 301 to 307 can be all executed or not executed, or partly executed and partly not executed; if step 313 is executed, the training device can pass step 315 Going to step 316 may also be going to step 316 through step 313. However, the specific meaning of the loss function in the foregoing multiple cases is not always complete. The various situations described above are separately described below.
在一种情况下,若步骤301至307均不执行,且步骤313执行,为通过步骤315进入步骤316,也即在第六分类类别与第一标注类别不同的情况下,进入步骤316,则步骤316可以包括:训练设备根据第一损失函数,对第一特征提取网络和第二特征提取网络进行训练。其中,第一损失函数用于表示第一分类类别和第一标注类别之间的相似度,且用于表示第二分类类别和第二标注类别之间的相似度。第一损失函数具体可以表现为交叉熵损失函数、最大间隔损失函数或其他类型的损失函数等,此次不做限定。为更直观的体会第一损失函数,如下示出第一损失函数的一种表达式:In one case, if steps 301 to 307 are not executed, and step 313 is executed, then step 315 is entered into step 316, that is, if the sixth classification category is different from the first label category, step 316 is entered, then Step 316 may include: the training device trains the first feature extraction network and the second feature extraction network according to the first loss function. The first loss function is used to indicate the similarity between the first classification category and the first annotation category, and is used to indicate the similarity between the second classification category and the second annotation category. The first loss function may specifically be expressed as a cross-entropy loss function, a maximum interval loss function, or other types of loss functions, which are not limited this time. In order to understand the first loss function more intuitively, an expression of the first loss function is shown as follows:
Figure PCTCN2021081238-appb-000004
Figure PCTCN2021081238-appb-000004
其中,L AS(θ,x,y)代表第一损失函数,l(h r(x adv;θ 1),y 1)代表第一分类类别和第一标注类别之间的相似度,x adv代表对抗图像,θ 1代表第一特征提取网络中的权重,y 1代表第一标注类别,
Figure PCTCN2021081238-appb-000005
代表第二分类类别和第二标注类别之间的相似度,θ 2代表第二特征提取网络中的权重,
Figure PCTCN2021081238-appb-000006
代表第二标注类别,应当理解,此次对于第一损失函数的举例仅为方便理解本方案,不用于限定本方案。
Among them, L AS (θ, x, y) represents the first loss function, l(h r (x adv ; θ 1 ), y 1 ) represents the similarity between the first classification category and the first label category, x adv Represents the confrontation image, θ 1 represents the weight in the first feature extraction network, and y 1 represents the first label category,
Figure PCTCN2021081238-appb-000005
Represents the similarity between the second classification category and the second label category, θ 2 represents the weight in the second feature extraction network,
Figure PCTCN2021081238-appb-000006
It represents the second label category. It should be understood that the example of the first loss function this time is only for the convenience of understanding the solution, and is not used to limit the solution.
具体的,训练设备在通过步骤315和316得到第一分类类别和第二分类类别之后,生成第一损失函数的函数值,得到与第一损失函数的函数值对应的梯度值,利用与第一损失函数的函数值对应的梯度值进行反向传播,以更新第一特征提取网络和第二特征提取网络的神经元权重,从而完成了对第一特征提取网络和第二特征提取网络的一次训练。Specifically, after obtaining the first classification category and the second classification category through steps 315 and 316, the training device generates the function value of the first loss function, obtains the gradient value corresponding to the function value of the first loss function, and uses the The gradient value corresponding to the function value of the loss function is back-propagated to update the neuron weights of the first feature extraction network and the second feature extraction network, thereby completing a training of the first feature extraction network and the second feature extraction network .
在另一种情况下,若执行步骤301至303以及步骤304和305,不执行步骤306和307, 且步骤313执行,为通过步骤315进入步骤316,则步骤316可以包括:训练设备根据第一损失函数和第二损失函数,对第一特征提取网络和第二特征提取网络进行训练。其中,第二损失函数用于表示第三分类类别与第三标注类别之间的相似度,第三标注类别为与原始图像对应的正确类别。本申请实施例中,在训练过程中,不仅利用对抗图像来训练第一特征提取网络和第二特征提取网络的特征提取能力,而且利用自然图像来训练第一特征提取网络和第二特征提取网络的特征提取能力,以进一步提高训练后的第一特征提取网络和训练后的第二特征提取网络在处理自然图像过程的准确度。In another case, if steps 301 to 303 and steps 304 and 305 are executed, steps 306 and 307 are not executed, and step 313 is executed, which is to enter step 316 through step 315, then step 316 may include: the training device according to the first The loss function and the second loss function train the first feature extraction network and the second feature extraction network. The second loss function is used to indicate the similarity between the third classification category and the third annotation category, and the third annotation category is the correct category corresponding to the original image. In the embodiments of this application, during the training process, not only the confrontation images are used to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also natural images are used to train the first feature extraction network and the second feature extraction network. To further improve the accuracy of the trained first feature extraction network and the trained second feature extraction network in the process of processing natural images.
具体的,训练设备在通过步骤301获取到第三标注类别,通过步骤305获取到第三分类类别之后,以及通过步骤315和316得到第一分类类别和第二分类类别之后,生成第一损失函数和第二损失函数的函数值。训练设备可以根据第一损失函数的函数值和第二损失函数的函数值,生成总函数值,并列用总函数值对第一特征提取网络和第二特征提取网络执行一次训练;具体的,训练设备可以直接将第一损失函数的函数值和第二损失函数的函数值相加,以得到总函数值,或者,训练设备可以为第一损失函数的函数值和第二损失函数的函数值分别分配不同的权重之后相加,以得到总函数值。利用函数值完成一次训练的具体步骤在可参阅上述描述,此处不做赘述。Specifically, the training device generates the first loss function after obtaining the third label category through step 301, after obtaining the third classification category through step 305, and after obtaining the first classification category and the second classification category through steps 315 and 316 And the function value of the second loss function. The training device can generate a total function value according to the function value of the first loss function and the function value of the second loss function, and use the total function value to perform a training on the first feature extraction network and the second feature extraction network; specifically, training The device may directly add the function value of the first loss function and the function value of the second loss function to obtain the total function value, or the training device may be the function value of the first loss function and the function value of the second loss function respectively After allocating different weights, add them to get the total function value. The specific steps of using function values to complete a training can refer to the above description, which will not be repeated here.
在另一种情况下,若执行步骤301至303以及步骤306,不执行步骤304、305和307,且步骤313执行,为通过步骤315进入步骤316,则步骤316可以包括:训练设备根据第一损失函数和第三损失函数,对第一特征提取网络和第二特征提取网络进行训练。其中,第三损失函数用于表示第四分类类别与第三标注类别之间的相似度,第三标注类别为与原始图像对应的正确类别。本申请实施例中,不仅利用对抗图像来训练第一特征提取网络和第二特征提取网络的特征提取能力,而且利用自然图像训练第一特征提取网络对鲁棒表示的提取能力,以进一步提高训练后第一特征提取网络的准确率。In another case, if steps 301 to 303 and step 306 are performed, steps 304, 305, and 307 are not performed, and step 313 is performed, which is to enter step 316 through step 315, then step 316 may include: the training device according to the first The loss function and the third loss function train the first feature extraction network and the second feature extraction network. Among them, the third loss function is used to represent the similarity between the fourth classification category and the third label category, and the third label category is the correct category corresponding to the original image. In the embodiments of this application, not only the confrontation images are used to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also natural images are used to train the robust representation extraction capabilities of the first feature extraction network to further improve training. The accuracy of the second feature extraction network.
具体的,训练设备在通过步骤301获取到第三标注类别,通过步骤306获取到第四分类类别之后,以及通过步骤315和316得到第一分类类别和第二分类类别之后,生成第一损失函数和第三损失函数的函数值。训练设备利用第一损失函数和第三损失函数的函数值,对第一特征提取网络和第二特征提取网络完成一次训练,具体实现方式可以参阅对利用第一损失函数和第二损失函数的函数值进行训练的描述,此处不做赘述。Specifically, the training device generates the first loss function after obtaining the third label category in step 301, after obtaining the fourth classification category in step 306, and after obtaining the first classification category and the second classification category in steps 315 and 316. And the function value of the third loss function. The training device uses the function values of the first loss function and the third loss function to complete a training of the first feature extraction network and the second feature extraction network. For specific implementation methods, please refer to the function of using the first loss function and the second loss function. The description of training is not repeated here.
在另一种情况下,若执行步骤301至303以及步骤307,不执行步骤304至306,且步骤313执行,为通过步骤315进入步骤316,则步骤316可以包括:训练设备根据第一损失函数和第四损失函数,对第一特征提取网络和第二特征提取网络进行训练。其中,第四损失函数用于表示第五分类类别与第三标注类别之间的相似度,第三标注类别为与原始图像对应的正确类别。本申请实施例中,不仅利用对抗图像来训练第一特征提取网络和第二特征提取网络的特征提取能力,而且利用自然图像训练第二特征提取网络对非鲁棒表示的提取能力,以进一步提高训练后第二特征提取网络的准确率。In another case, if steps 301 to 303 and step 307 are executed, steps 304 to 306 are not executed, and step 313 is executed to enter step 316 through step 315, step 316 may include: training the device according to the first loss function And the fourth loss function to train the first feature extraction network and the second feature extraction network. Among them, the fourth loss function is used to represent the similarity between the fifth classification category and the third label category, and the third label category is the correct category corresponding to the original image. In the embodiments of this application, not only the confrontation images are used to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also natural images are used to train the second feature extraction network's extraction capabilities for non-robust representations to further improve The accuracy of the second feature extraction network after training.
具体的,训练设备在通过步骤301获取到第四标注类别,通过步骤306获取到第五分类类别之后,以及通过步骤315和316得到第一分类类别和第二分类类别之后,生成第一损失函数和第四损失函数的函数值。训练设备利用第一损失函数和第四损失函数的函数值, 对第一特征提取网络和第二特征提取网络完成一次训练,具体实现方式可以参阅对利用第一损失函数和第二损失函数的函数值进行训练的描述,此处不做赘述。Specifically, the training device generates the first loss function after obtaining the fourth label category through step 301, after obtaining the fifth classification category through step 306, and after obtaining the first classification category and the second classification category through steps 315 and 316 And the function value of the fourth loss function. The training device uses the function values of the first loss function and the fourth loss function to complete a training of the first feature extraction network and the second feature extraction network. For specific implementation methods, please refer to the function of using the first loss function and the second loss function. The description of training is not repeated here.
在另一种情况下,若执行步骤301至306,不执行步骤307,且步骤313执行,为通过步骤315进入步骤316,则步骤316可以包括:训练设备根据第一损失函数、第二损失函数和第三损失函数,对第一特征提取网络和第二特征提取网络进行训练。具体的,训练设备在生成第一损失函数的函数值、第二损失函数的函数值和第三损失函数的函数值之后,可以根据第一损失函数、第二损失函数和第三损失函数的函数值,生成总的函数值,进而利用总的函数值,对第一特征提取网络和第二特征提取网络进行训练。具体实现方式可以参阅对根据第一损失函数和第二损失函数的函数值进行训练的描述,此处不做赘述。In another case, if steps 301 to 306 are performed, step 307 is not performed, and step 313 is performed, which is to enter step 316 through step 315, then step 316 may include: training the device according to the first loss function and the second loss function And the third loss function to train the first feature extraction network and the second feature extraction network. Specifically, after the training device generates the function value of the first loss function, the function value of the second loss function, and the function value of the third loss function, it can be based on the functions of the first loss function, the second loss function, and the third loss function. Value, generate a total function value, and then use the total function value to train the first feature extraction network and the second feature extraction network. For a specific implementation manner, please refer to the description of training based on the function values of the first loss function and the second loss function, which will not be repeated here.
在另一种情况下,若执行步骤301至305以及步骤307,不执行步骤306,且步骤313执行,为通过步骤315进入步骤316,则步骤316可以包括:训练设备根据第一损失函数、第二损失函数和第四损失函数,对第一特征提取网络和第二特征提取网络进行训练。具体实现方式可以参阅上述描述,此处不做赘述。In another case, if steps 301 to 305 and step 307 are executed, step 306 is not executed, and step 313 is executed, to enter step 316 through step 315, step 316 may include: training the device according to the first loss function and the first loss function. The second loss function and the fourth loss function are used to train the first feature extraction network and the second feature extraction network. For specific implementation methods, please refer to the above description, which will not be repeated here.
在另一种情况下,若执行步骤301至303以及步骤306和307,不执行步骤304和305,且步骤313执行,为通过步骤315进入步骤316,则步骤316可以包括:训练设备根据第一损失函数、第三损失函数和第四损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件。具体实现方式可以参阅上述描述,此处不做赘述。In another case, if steps 301 to 303 and steps 306 and 307 are executed, steps 304 and 305 are not executed, and step 313 is executed, which is to enter step 316 through step 315, then step 316 may include: the training device according to the first The loss function, the third loss function, and the fourth loss function are iteratively trained on the first feature extraction network and the second feature extraction network until the convergence condition is met. For specific implementation methods, please refer to the above description, which will not be repeated here.
在另一种情况下,若执行步骤301至307,且步骤313执行,为通过步骤315进入步骤316,则步骤316可以包括:训练设备根据第一损失函数和第五损失函数,对第一特征提取网络和第二特征提取网络进行训练。其中,第五损失函数用于表示第四分类类别与第三标注类别之间的相似度,且用于表示第五分类类别与第三标注类别之间的相似度,且用于表示第六分类类别与第三标注类别之间的相似度,第三标注类别为与原始图像对应的正确类别。第五损失函数具体可以表现为交叉熵损失函数、最大间隔损失函数或其他类型的损失函数等,此次不做限定。为更直观的体会第五损失函数,如下示出第五损失函数的一种表达式:In another case, if steps 301 to 307 are executed, and step 313 is executed, it is to enter step 316 through step 315, then step 316 may include: the training device performs an evaluation of the first feature according to the first loss function and the fifth loss function. The extraction network and the second feature extraction network are trained. Among them, the fifth loss function is used to represent the similarity between the fourth classification category and the third label category, and is used to represent the similarity between the fifth classification category and the third label category, and is used to represent the sixth category The similarity between the category and the third labeled category, which is the correct category corresponding to the original image. The fifth loss function may specifically be expressed as a cross-entropy loss function, a maximum interval loss function, or other types of loss functions, which are not limited this time. In order to understand the fifth loss function more intuitively, an expression of the fifth loss function is shown as follows:
L total(θ,x,y)=L AS(θ,x,y)+L ST(θ,x,y); L total (θ, x, y) = L AS (θ, x, y) + L ST (θ, x, y);
L ST(θ,x,y)=l(h s(x;θ 3),y 2)+l(h r(x;θ 1),y 2)+l(h n(x;θ 2),y 2); L ST (θ,x,y)=l(h s (x;θ 3 ),y 2 )+l(h r (x;θ 1 ),y 2 )+l(h n (x;θ 2 ) , Y 2 );
其中,L total(θ,x,y)代表总损失函数,L AS(θ,x,y)代表第一损失函数,具体含义可以参阅上述描述,此处不做赘述,L ST(θ,x,y)代表第五损失函数,l(h s(x;θ 3),y 2)代表第四分类类别与第三标注类别之间的相似度,x代表原始图像,θ 3代表第一特征提取网络和第二特征提取网络的权重,y 2代表第三标注类别,l(h r(x;θ 1),y 2)代表第五分类类别与第三标注类别之间的相似度,l(h n(x;θ 2),y 2)代表第六分类类别与第三标注类别之间的相似度,对于上述公式中其他字母的函数可以参阅上述对第一损失函数的描述中,应当理解,此次对于第五损失函数的举例仅为方便理解本方案,不用于限定本方案。 Among them, L total (θ, x, y) represents the total loss function, and L AS (θ, x, y) represents the first loss function. For the specific meaning, please refer to the above description, which will not be repeated here. L ST (θ, x , Y) represents the fifth loss function, l(h s (x; θ 3 ), y 2 ) represents the similarity between the fourth classification category and the third label category, x represents the original image, and θ 3 represents the first feature The weight of the extraction network and the second feature extraction network, y 2 represents the third annotation category, l(h r (x; θ 1 ), y 2 ) represents the similarity between the fifth classification category and the third annotation category, l (h n (x; θ 2 ), y 2 ) represents the similarity between the sixth classification category and the third label category. For the functions of other letters in the above formula, please refer to the above description of the first loss function. It is understood that the example of the fifth loss function this time is only to facilitate the understanding of the solution, and is not used to limit the solution.
本申请实施例中,在提高第一特征提取网络和第二特征提取网络对对抗图像的处理能力的同时,也提高第一特征提取网络和第二特征提取网络对自然图像的处理能力,也即无论是自然图像还是对抗图像,训练后的第一特征提取网络和第二特征提取网络均能准确的 提取出鲁棒表示和非鲁棒表示,扩展了本方案的应用场景。In the embodiments of the present application, while improving the processing capabilities of the first feature extraction network and the second feature extraction network on confrontation images, the processing capabilities of the first feature extraction network and the second feature extraction network on natural images are also improved, that is, Whether it is a natural image or an adversarial image, the trained first feature extraction network and the second feature extraction network can accurately extract robust and non-robust representations, which expands the application scenarios of this solution.
在另一种情况下,若步骤301至307均不执行,且为通过步骤313进入步骤316,则训练设备不再根据损失函数,对第一特征提取网络和第二特征提取网络进行训练,而是重新进入步骤308,以获取新的对抗图像和新的第一标注类别,也即进入新一次的训练过程中。In another case, if steps 301 to 307 are not executed, and step 316 is entered through step 313, the training device no longer trains the first feature extraction network and the second feature extraction network according to the loss function, and It is to re-enter step 308 to obtain a new confrontation image and a new first label category, that is, to enter a new training process.
在另一种情况下,若执行步骤301至303以及步骤304和305,不执行步骤306和307,且为通过步骤313进入步骤316,则步骤316可以包括:训练设备根据第二损失函数,对第一特征提取网络和第二特征提取网络进行训练。In another case, if steps 301 to 303 and steps 304 and 305 are executed, steps 306 and 307 are not executed, and step 316 is entered through step 313, then step 316 may include: training the device according to the second loss function, The first feature extraction network and the second feature extraction network are trained.
在另一种情况下,若执行步骤301至303以及步骤306,不执行步骤304、305和307,且为通过步骤313进入步骤316,则步骤316可以包括:训练设备根据第三损失函数,对第一特征提取网络进行训练。In another case, if steps 301 to 303 and step 306 are executed, steps 304, 305, and 307 are not executed, and step 316 is entered through step 313, then step 316 may include: the training device performs the calculation according to the third loss function The first feature extraction network is trained.
在另一种情况下,若执行步骤301至303以及步骤307,不执行步骤304至306,且为通过步骤313进入步骤316,则步骤316可以包括:训练设备根据第四损失函数,对第二特征提取网络进行训练。In another case, if steps 301 to 303 and step 307 are performed, steps 304 to 306 are not performed, and step 316 is entered through step 313, then step 316 may include: the training device performs a calculation of the second loss function according to the fourth loss function. The feature extraction network is trained.
在另一种情况下,若执行步骤301至306,不执行步骤307,且为通过步骤313进入步骤316,则步骤316可以包括:训练设备根据第二损失函数和第三损失函数,对第一特征提取网络和第二特征提取网络进行训练。In another case, if steps 301 to 306 are performed, step 307 is not performed, and step 316 is entered through step 313, then step 316 may include: the training device performs a calculation of the first loss function and the third loss function according to the second loss function and the third loss function. The feature extraction network and the second feature extraction network are trained.
在另一种情况下,若执行步骤301至305以及步骤307,不执行步骤306,且为通过步骤313进入步骤316,则步骤316可以包括:训练设备根据第二损失函数和第四损失函数,对第一特征提取网络和第二特征提取网络进行训练。In another case, if steps 301 to 305 and step 307 are executed, step 306 is not executed, and step 316 is entered through step 313, step 316 may include: training the device according to the second loss function and the fourth loss function, Train the first feature extraction network and the second feature extraction network.
在另一种情况下,若执行步骤301至303以及步骤306和307,不执行步骤304和305,且为通过步骤313进入步骤316,则步骤316可以包括:训练设备根据第三损失函数和第四损失函数,对第一特征提取网络和第二特征提取网络进行训练。In another case, if steps 301 to 303 and steps 306 and 307 are executed, steps 304 and 305 are not executed, and step 316 is entered through step 313, step 316 may include: training the device according to the third loss function and the first Four loss functions, training the first feature extraction network and the second feature extraction network.
在另一种情况下,若步骤301至307均执行,且为通过步骤313进入步骤316,则步骤316可以包括:训练设备根据第五损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练。In another case, if steps 301 to 307 are all performed, and step 316 is entered through step 313, step 316 may include: training the device to perform the first feature extraction network and the second feature extraction network according to the fifth loss function Perform iterative training.
在另一种情况下,若不执行步骤313,则步骤316的具体实现方式可以参阅上述执行步骤313,且通过步骤315进入步骤316的各种情况中的描述,此处不做赘述。In another case, if step 313 is not performed, the specific implementation of step 316 can refer to the description of various situations in which step 313 is performed and step 316 is entered through step 315, which is not repeated here.
本申请实施例中,若第六分类类别与第一标注类别相同,则证明该扰动后图像的扰动过于轻微,则对于第一特征提取网络和第二特征提取网络而言,处理方式与对自然图像的处理方式相差不大,而此处训练的目的是增强第一特征提取网络和第二特征提取网络从扰动较大的图像中分离鲁棒表示和非鲁棒表示的能力,仅在第六分类类别与第一标注类别不同的情况下,才做后续的训练操作,以提高训练过程的效率。In the embodiment of this application, if the sixth classification category is the same as the first annotation category, it is proved that the disturbance of the image after the disturbance is too slight. For the first feature extraction network and the second feature extraction network, the processing method is the same as that of the natural The image processing methods are not much different, and the purpose of training here is to enhance the ability of the first feature extraction network and the second feature extraction network to separate robust and non-robust representations from images with larger disturbances. Only in the sixth When the classification category is different from the first label category, subsequent training operations are performed to improve the efficiency of the training process.
317、训练设备输出训练后的第一特征提取网络和训练后的第二特征提取网络。317. The training device outputs the trained first feature extraction network and the trained second feature extraction network.
本申请实施例中,训练设备在确定满足收敛条件之后,会输出训练后的第一特征提取网络和训练后的第二特征提取网络,训练后的第一特征提取网络和训练后的第二特征提取网络可以作为各种图像处理网络的特征提取部分,也即训练后的第一特征提取网络和训练 后的第二特征提取网络可以与高层的特征处理网络组合,以实现各种功能。作为一种示例,前述各种功能可以包括以下中的一项或多项:图像分类、图像识别、图像分割或图像检测。作为另一示例,前述功能也可以为进行图像类别判断,例如判断图像为自然图像还是对抗图像。In the embodiment of the present application, after determining that the convergence condition is satisfied, the training device will output the trained first feature extraction network and the trained second feature extraction network, the trained first feature extraction network and the trained second feature The extraction network can be used as the feature extraction part of various image processing networks, that is, the trained first feature extraction network and the trained second feature extraction network can be combined with the high-level feature processing network to realize various functions. As an example, the aforementioned various functions may include one or more of the following: image classification, image recognition, image segmentation, or image detection. As another example, the aforementioned function may also be to determine the image category, for example, to determine whether the image is a natural image or a confrontational image.
本申请实施例中,技术人员在研究过程中发现,通过对抗训练使得神经网络只从输入图像中提取的鲁棒表示,而舍弃了非鲁棒表示,导致神经网络处理原始图像时准确率的下降,而本申请实施例中,训练后的第一特征提取网络和训练后的第二特征提取网络能够分别提取输入图像中的鲁棒表示和非鲁棒表示,既避免了两者混杂从而导致鲁棒性的降低,也能够同时保留输入图像中的鲁棒表示和非鲁棒表示,从而避免了准确率的降低,以同时提升神经网络的鲁棒性和准确率。In the embodiments of this application, the technicians discovered during the research process that through adversarial training, the neural network only extracts the robust representation from the input image, while discarding the non-robust representation, resulting in a decrease in the accuracy of the neural network when processing the original image. However, in the embodiment of the present application, the trained first feature extraction network and the trained second feature extraction network can respectively extract the robust representation and the non-robust representation in the input image, which avoids the mixing of the two and leads to robustness. The reduction in robustness can also retain both the robust representation and the non-robust representation in the input image, thereby avoiding the reduction in accuracy and improving the robustness and accuracy of the neural network at the same time.
二、推理阶段Second, the reasoning stage
本申请实施例中,推理阶段指的是上述执行设备210利用训练后的图像处理网络对输入的图像进行处理的过程。由于通过图3对应的各个实施例中得到的为训练后的第一特征提取网络和训练后的第二特征提取网络,而训练后的第一特征提取网络和训练后的第二特征提取网络,与,各种高层的特征处理网络层组合,可以实现各种不同的功能,具体实现功能在步骤317中已经进行了介绍,以下分别对步骤317中的两大类图像处理网络进行介绍。In the embodiment of the present application, the inference stage refers to the process in which the above-mentioned execution device 210 uses the trained image processing network to process the input image. Since the first feature extraction network after training and the second feature extraction network after training are obtained through the respective embodiments corresponding to FIG. 3, the first feature extraction network after training and the second feature extraction network after training are obtained. And, the combination of various high-level feature processing network layers can implement various functions. The specific implementation functions have been introduced in step 317. The two types of image processing networks in step 317 are introduced below.
首先介绍的是处理目标为图像中的对象的图像处理网络,也即上述:图像分类、图像识别、图像分割或图像检测等。请参阅图6,图6为本申请实施例提供的图像处理方法的一种流程示意图,本申请实施例提供的图像处理方法可以包括:The first introduction is the image processing network where the processing target is the object in the image, that is, the above-mentioned: image classification, image recognition, image segmentation, or image detection. Please refer to FIG. 6. FIG. 6 is a schematic flowchart of an image processing method provided by an embodiment of the present application. The image processing method provided by an embodiment of the present application may include:
601、执行设备获取第一图像。601. The execution device acquires a first image.
本申请实施例中,执行设备可以为实时采集第一图像,也可以为从执行设备存储的图库中获取第一图像,还可以为通过无线或有线网络下载得第一图像。其中,第一图像可以为原始图像,也可以为对抗图像。由于执行设备可以具体表现为手机、电脑、可穿戴设备、自动驾驶车辆、智能家电或芯片等,不同形态的执行设备获取第一图像的方式可以不同。作为示例,例如执行设备为手机,则执行设备可以通过手机上的摄像头采集获取第一图像,也可以为利用浏览器下载获得第一图像。作为另一示例,例如执行设备为自动驾驶车辆,自动驾驶车辆可以通过传感器采集获得第一图像等等,具体执行设备获取第一图像的方式可以结合实际应用场景和应用产品确定,此处不做赘述。In the embodiment of the present application, the execution device may collect the first image in real time, may also obtain the first image from a gallery stored by the execution device, or may also be the first image downloaded through a wireless or wired network. Among them, the first image may be an original image or a confrontation image. Since the execution device may be specifically represented as a mobile phone, a computer, a wearable device, an autonomous vehicle, a smart home appliance, or a chip, etc., different types of execution devices may obtain the first image in different ways. As an example, for example, if the execution device is a mobile phone, the execution device may collect and obtain the first image through a camera on the mobile phone, or may download and obtain the first image through a browser. As another example, for example, the execution device is an autonomous driving vehicle, and the autonomous driving vehicle can obtain the first image through sensor collection, etc. The specific execution device acquiring the first image can be determined in combination with actual application scenarios and application products, which is not done here. Go into details.
602、执行设备将第一图像输入第一特征提取网络,得到第一特征提取网络生成的第三鲁棒表示。602. The execution device inputs the first image into the first feature extraction network to obtain a third robust representation generated by the first feature extraction network.
本申请实施例中,执行设备在获取到第一图像之后,将第一图像输入到第一特征提取网络中,使得第一特征提取网络根据输入的第一图像,生成与第一图像对应的第三鲁棒表示。其中,第一特征提取网络的具体表现形式和鲁棒表示的含义可以参阅图3对应实施例中的描述,此处不再赘述。In the embodiment of the present application, after acquiring the first image, the execution device inputs the first image into the first feature extraction network, so that the first feature extraction network generates the first image corresponding to the first image according to the input first image. Three robust representations. For the specific expression form and the meaning of the robust representation of the first feature extraction network, please refer to the description in the embodiment corresponding to FIG. 3, which will not be repeated here.
603、执行设备将第一图像输入第二特征提取网络,得到第一特征提取网络生成的第三非鲁棒表示。603. The execution device inputs the first image into the second feature extraction network to obtain a third non-robust representation generated by the first feature extraction network.
本申请实施例中,执行设备在获取到第一图像之后,将第一图像输入到第二特征提取网络中,使得第二特征提取网络根据输入的第一图像,生成与第一图像对应的第三非鲁棒表示。其中,第二特征提取网络的具体表现形式和非鲁棒表示的含义可以参阅图3对应实施例中的描述,此处不再赘述。In the embodiment of the present application, after acquiring the first image, the execution device inputs the first image into the second feature extraction network, so that the second feature extraction network generates the first image corresponding to the first image according to the input first image. Three non-robust representations. For the specific expression form of the second feature extraction network and the meaning of the non-robust representation, please refer to the description in the embodiment corresponding to FIG. 3, which will not be repeated here.
604、在第一情况下,执行设备将第三鲁棒表示和第三非鲁棒表示组合,得到组合后的第四表示。604. In the first case, the execution device combines the third robust representation and the third non-robust representation to obtain a combined fourth representation.
本申请实施例中,在第一情况下,执行设备将第三鲁棒表示和第三非鲁棒表示组合,得到组合后的第四表示。组合的方式以及步骤604的具体实现方式可以参阅图3对应实施例中步骤304中的描述,第一情况指的可以为在对图像处理网络输出结果的准确率要求高的情况,具体情况可以结合实际应用场景确定,此处不做限定。In the embodiment of the present application, in the first case, the execution device combines the third robust representation and the third non-robust representation to obtain the combined fourth representation. For the combination method and the specific implementation of step 604, please refer to the description of step 304 in the corresponding embodiment in FIG. 3. The first case refers to the case where the accuracy of the output result of the image processing network is high, and the specific circumstances can be combined. The actual application scenario is determined, and there is no limitation here.
605、执行设备通过特征处理网络,根据组合后的第四表示,输出与第一图像对应的第一处理结果。605. The execution device outputs a first processing result corresponding to the first image according to the combined fourth representation through the feature processing network.
本申请实施例中,执行设备在得到组合后的第四表示之后,将组合后的第四表示输入特征处理网络中,以使特征处理网络根据组合后的第四表示,输出与第一图像对应的第一处理结果。其中,特征处理网络的具体实现方式以及第一处理结果的具体表现方式均与整个图像处理网络的功能有关。作为示例,例如图像处理网络的功能是图像分类,则特征处理网络可以为分类网络,第一处理结果用于指示整个图像的分类类别;进一步地,分类网络具体可以表现为一个包括至少一个感知机的神经网络,前述感知机可以为双层全连接感知机。作为另一示例,例如图像处理网络的功能是图像识别,则特征处理网络可以为识别网络,第一处理结果用于指示从图像中识别出的内容,例如图像中的文字内容等。作为再一示例,例如图像处理网络的功能是图像分割,则特征处理网络可以包括分类网络,该分类网络用于生成图像中每个像素点的分类类别,进而利用图像中每个像素点的分类类别对图像进行分割,第一处理结果为分割后的图像。作为再一示例,例如图像处理网络的功能是图像检测,第一处理结果具体可以表现为检测结果,检测结果指示第一图像中包括哪些对象,也即可以指示第一图像中包括的至少一个对象的对象类型,可选地,检测结果还可以包括前述至少一个对象中每个对象的位置信息等,具体可以结合实际产品需求确定,此处不做穷举。本申请实施例中,提供了图像处理网络的多种具体实现方式,扩展了本方案的应用场景,提高了本方案的实现灵活性。In the embodiment of the present application, after obtaining the combined fourth representation, the execution device inputs the combined fourth representation into the feature processing network, so that the feature processing network outputs corresponding to the first image according to the combined fourth representation The first processing result. Among them, the specific implementation mode of the feature processing network and the specific expression mode of the first processing result are related to the function of the entire image processing network. As an example, for example, the function of the image processing network is image classification, the feature processing network may be a classification network, and the first processing result is used to indicate the classification category of the entire image; further, the classification network may specifically be represented as a network that includes at least one perceptron. The aforementioned perceptron can be a double-layer fully connected perceptron. As another example, for example, the function of the image processing network is image recognition, the feature processing network may be a recognition network, and the first processing result is used to indicate the content recognized from the image, such as text content in the image. As another example, for example, the function of the image processing network is image segmentation, the feature processing network may include a classification network, which is used to generate the classification category of each pixel in the image, and then use the classification of each pixel in the image The category divides the image, and the first processing result is the divided image. As another example, for example, the function of the image processing network is image detection, the first processing result may be specifically expressed as a detection result, and the detection result indicates which objects are included in the first image, that is, it may indicate at least one object included in the first image Optionally, the detection result can also include the position information of each object in the aforementioned at least one object, etc., which can be specifically determined in combination with actual product requirements, and will not be exhaustive here. In the embodiments of the present application, a variety of specific implementation manners of the image processing network are provided, which expands the application scenarios of this solution and improves the implementation flexibility of this solution.
具体的,若特征处理网络为分类网络,则执行设备上的分类网络可以根据组合后的第四表示执行分类操作,输出与第一图像对应的分类类别。若特征处理网络为识别网络,则执行设备上的识别网络可以根据组合后的第四表示执行识别操作,输出与第一图像对应的识别结果等。此处不对所有的应用场景进行穷举。Specifically, if the feature processing network is a classification network, the classification network on the execution device may perform a classification operation according to the combined fourth representation, and output a classification category corresponding to the first image. If the feature processing network is a recognition network, the recognition network on the execution device can perform a recognition operation according to the combined fourth representation, and output a recognition result corresponding to the first image. All application scenarios are not exhaustively listed here.
606、在第二情况下,执行设备通过特征处理网络,根据第三鲁棒表示,输出与第一图像对应的第一处理结果。606. In the second case, the execution device outputs the first processing result corresponding to the first image according to the third robust representation through the feature processing network.
本申请实施例中,在第二种情况下,执行设备还可以将第三鲁棒表示输入到特征提取网络中,以使特征提取网络根据第三鲁棒表示,输出与第一图像对应的第一处理结果。具体实现方式可以参阅图3对应实施例中步骤306中的描述。其中,第一情况和第二情况为 不同的情况,第二情况指的可以为对图像处理网络输出结果的鲁棒性要求高的情况下,或者,第二情况指的是图像处理网络处于高风险的状态,也即输入的图像为扰动后图像的概率很高的情况下等,具体情况可以结合实际应用场景确定,此处不做限定。In the embodiment of the present application, in the second case, the execution device may also input the third robust representation into the feature extraction network, so that the feature extraction network outputs the first image corresponding to the first image according to the third robust representation. One processing result. For a specific implementation manner, refer to the description in step 306 in the embodiment corresponding to FIG. 3. Among them, the first situation and the second situation are different situations. The second situation refers to a situation where the robustness of the output result of the image processing network is required to be high, or the second situation refers to the situation where the image processing network is at a high level. The state of risk, that is, when the probability that the input image is a disturbed image is high, etc., the specific situation can be determined in combination with the actual application scenario, which is not limited here.
具体的,若特征处理网络为分类网络,则执行设备上的分类网络可以根据第三鲁棒表示执行分类操作,输出与第一图像对应的分类类别。若特征处理网络为识别网络,则执行设备上的识别网络可以根据第三鲁棒表示执行识别操作,输出与第一图像对应的识别结果等。此处不对所有的应用场景进行穷举。Specifically, if the feature processing network is a classification network, the classification network on the execution device may perform the classification operation according to the third robust representation, and output the classification category corresponding to the first image. If the feature processing network is a recognition network, the recognition network on the execution device can perform the recognition operation according to the third robust representation, and output the recognition result corresponding to the first image, etc. All application scenarios are not exhaustively listed here.
本申请实施例中,图像处理网络中同时包括鲁棒路径和标准路径,用户可以根据实际情况灵活选择使用哪一条路径,扩展了本方案的应用场景,提高了本方案的实现灵活性。In the embodiment of the present application, the image processing network includes both a robust path and a standard path, and the user can flexibly choose which path to use according to the actual situation, which expands the application scenarios of the solution and improves the flexibility of the solution.
607、在第三情况下,执行设备通过特征处理网络,根据第三非鲁棒表示,输出与第一图像对应的第一处理结果。607. In the third case, the execution device outputs the first processing result corresponding to the first image according to the third non-robust representation through the feature processing network.
本申请的一些实施例中,在第三情况下,执行设备还可以将第三非鲁棒表示输入到特征提取网络中,以使特征提取网络根据第三非鲁棒表示,输出与第一图像对应的第一处理结果,其中,第三情况不同于第一情况和第二情况。具体实现方式可以参阅图3对应实施例中步骤307中的描述。In some embodiments of the present application, in the third case, the execution device may also input the third non-robust representation into the feature extraction network, so that the feature extraction network outputs the same as the first image according to the third non-robust representation. Corresponding to the first processing result, where the third situation is different from the first situation and the second situation. For a specific implementation manner, refer to the description in step 307 in the embodiment corresponding to FIG. 3.
具体的,若特征处理网络为分类网络,则执行设备上的分类网络可以根据第三非鲁棒表示执行分类操作,输出与第一图像对应的分类类别。若特征处理网络为识别网络,则执行设备上的识别网络可以根据第三非鲁棒表示执行识别操作,输出与第一图像对应的识别结果等。此处不对所有的应用场景进行穷举。本申请实施例中,将提供的图像处理方法落到图像分类这一具体应用场景中,提高了与应用场景的结合程度。Specifically, if the feature processing network is a classification network, the classification network on the execution device may perform the classification operation according to the third non-robust representation, and output the classification category corresponding to the first image. If the feature processing network is a recognition network, the recognition network on the execution device may perform the recognition operation according to the third non-robust representation, and output the recognition result corresponding to the first image. All application scenarios are not exhaustively listed here. In the embodiment of the present application, the provided image processing method falls into the specific application scenario of image classification, which improves the degree of integration with the application scenario.
需要说明的是,步骤607为可选步骤,若不执行步骤607,则在执行完步骤605或者在执行完步骤606后可以执行结束。此外,上述实施例中展示的步骤605、步骤606和步骤607为一种并列的关系,在一些实施例中,步骤605、步骤606和步骤607也可以同时执行,作为示例,例如步骤605和606可以均被执行,或者步骤605和607可以均被执行,或者步骤606和607可以均被执行,或者步骤605至607均被执行等,具体哪些步骤被执行可以结合具体应用场景来确定,此处不做限定。It should be noted that step 607 is an optional step. If step 607 is not performed, the execution can end after step 605 is performed or after step 606 is performed. In addition, step 605, step 606, and step 607 shown in the above embodiment are in a parallel relationship. In some embodiments, step 605, step 606, and step 607 can also be performed at the same time, as an example, such as steps 605 and 606. Can be executed, or steps 605 and 607 can be executed, or steps 606 and 607 can be executed, or steps 605 to 607 are executed, etc., which specific steps are executed can be determined in conjunction with specific application scenarios, here Not limited.
为进一步理解本方案,请参阅图7,图7为本申请实施例提供的图像处理方法中图像处理网络的一个示意图。图7中以图像处理网络为图像分类网络为例,图像处理网络中包括第一特征提取网络、第二特征提取网络和分类网络。将第一图像分别输入到第一特征提取网络和第二特征提取网络中,得到第一特征提取网络生成的鲁棒表示,和第二特征提取网络生成的非鲁棒表示。图7的分类网络中包括三种途径,分别为鲁棒路径、标准路径和非鲁棒路径。其中,鲁棒路径指的是分类网络根据鲁棒表示进行分类操作;标准路径指的是将鲁棒表示和非鲁棒表示组合,分类网络根据组合后的表示进行分类操作;非鲁棒路径指的是分类网络根据非鲁棒表示进行分类操作。图7示出的中鲁棒路径、标准路径以及非鲁棒路径所使用的为同一个分类网络,应理解,图7中的示例仅为方便理解本方案,在其他实现方式中,鲁棒路径、标准路径以及非鲁棒路径所使用的可以为三个不同的分类网络。To further understand this solution, please refer to FIG. 7. FIG. 7 is a schematic diagram of the image processing network in the image processing method provided by the embodiment of the application. In FIG. 7, an image processing network is taken as an image classification network as an example. The image processing network includes a first feature extraction network, a second feature extraction network, and a classification network. The first image is input into the first feature extraction network and the second feature extraction network, respectively, to obtain a robust representation generated by the first feature extraction network and a non-robust representation generated by the second feature extraction network. The classification network in Fig. 7 includes three paths, namely robust paths, standard paths, and non-robust paths. Among them, the robust path refers to the classification network based on the robust representation; the standard path refers to the combination of the robust representation and the non-robust representation, and the classification network performs classification operations based on the combined representation; the non-robust path refers to The classification network performs classification operations based on non-robust representations. Figure 7 shows that the robust path, standard path, and non-robust path use the same classification network. It should be understood that the example in Figure 7 is only to facilitate understanding of the solution. In other implementations, the robust path The standard path and the non-robust path can use three different classification networks.
其次介绍的是处理目标为图像的图像处理网络,也即通过图像处理网络来确定输入的 是自然图像还是对抗图像。The second introduction is the image processing network whose processing target is an image, that is, the image processing network is used to determine whether the input is a natural image or an adversarial image.
请参阅图8,图8为本申请实施例提供的图像处理方法的一种流程示意图,本申请实施例提供的图像处理方法可以包括:Please refer to FIG. 8. FIG. 8 is a schematic flowchart of an image processing method provided by an embodiment of the present application. The image processing method provided by an embodiment of the present application may include:
801、执行设备获取第一图像。801. The execution device acquires a first image.
802、执行设备将第一图像输入第一特征提取网络,得到第一特征提取网络生成的第三鲁棒表示。802. The execution device inputs the first image into the first feature extraction network to obtain a third robust representation generated by the first feature extraction network.
803、执行设备将第一图像输入第二特征提取网络,得到第一特征提取网络生成的第三非鲁棒表示。803. The execution device inputs the first image into the second feature extraction network to obtain a third non-robust representation generated by the first feature extraction network.
本申请实施例中,步骤801至803的具体实现方式可以参阅图6对应实施例中步骤601至603中的描述,此处不做赘述。In the embodiment of the present application, for the specific implementation of steps 801 to 803, please refer to the description of steps 601 to 603 in the embodiment corresponding to FIG. 6, which will not be repeated here.
804、执行设备通过特征处理网络,根据第三鲁棒表示和第三非鲁棒表示,输出与第一图像对应的第一处理结果,第一处理结果指示第一图像为原始图像,或者,第一处理结果指示第一图像为扰动后的图像。804. The execution device outputs a first processing result corresponding to the first image according to the third robust representation and the third non-robust representation through the feature processing network, the first processing result indicates that the first image is the original image, or the first A processing result indicates that the first image is a disturbed image.
本申请实施例中,执行设备在获取到第三鲁棒表示和第三非鲁棒表示之后,可以通过特征处理网络,根据第三鲁棒表示和第三非鲁棒表示,输出该第一处理结果。其中,特征处理网络中可以包括至少一个感知机,感知机的含义可以参阅图3对应实施例中对步骤305的描述。In the embodiment of the present application, after the execution device obtains the third robust representation and the third non-robust representation, it can output the first processing according to the third robust representation and the third non-robust representation through the feature processing network result. Among them, the feature processing network may include at least one perceptron. For the meaning of the perceptron, refer to the description of step 305 in the embodiment corresponding to FIG. 3.
具体的,在一种实现方式中,步骤804可以包括:执行设备在获取到第三鲁棒表示和第三非鲁棒表示之后,将第三鲁棒表示和第三非鲁棒输入到特征处理网络中,以通过特征处理网络,根据鲁棒表示确定与第一图像对应的第七分类类别,根据非鲁棒表示确定与第一图像对应的第八分类类别。更具体的,在一种情况下,特征处理网络中可以包括一个分类网络,执行设备利用特征处理网络中的一个分类网络顺序执行两次分类操作,分别得到第七分类类别和第八分类类别。在另一种情况下,特征处理网络中可以包括两个分类网络,执行设备利用特征处理网络中的两个分类网络并行执行两次分类操作,分别得到第七分类类别和第八分类类别。Specifically, in an implementation manner, step 804 may include: after obtaining the third robust representation and the third non-robust representation, the execution device inputs the third robust representation and the third non-robust representation to the feature processing In the network, through the feature processing network, the seventh classification category corresponding to the first image is determined according to the robust representation, and the eighth classification category corresponding to the first image is determined according to the non-robust representation. More specifically, in one case, the feature processing network may include a classification network, and the execution device uses a classification network in the feature processing network to sequentially perform two classification operations to obtain the seventh classification category and the eighth classification category, respectively. In another case, the feature processing network may include two classification networks, and the execution device uses the two classification networks in the feature processing network to perform two classification operations in parallel to obtain the seventh classification category and the eighth classification category, respectively.
进而执行设备通过特征处理网络判断第七分类类别和第八分类类别是否一致,在第七分类类别与第八分类类别一致的情况下,特征处理网络输出的第一处理结果指示第一图像为原始图像;在第七分类类别与第八分类类别不一致的情况下,特征处理网络输出的第一处理结果指示第一图像为扰动后的图像。其中,第一处理结果具体可以表现为文本形式,作为示例,例如第一处理结果具体表现为“自然图像”或者“对抗图像”。第一处理结果也可以表现为字符形式。作为示例,例如第一处理结果具体表现为“0 0.3 1 0.7”,前述字符中0可以指代自然图像,1可以指代对抗图像,也即有0.3的概率为自然图像,有0.7的概率为对抗图像,从而第一处理结果指示第一图像为对抗图像。作为另一示例,例如第一处理结果具体表现为“0.3 0.7”,0.3指示第一图像为自然图像的概率,0.7指示第一图像为对抗图像的概率等,从而第一处理结果指示第一图像为对抗图像。应理解,此处对第一处理结果的举例仅为方便理解本方案,不用于限定本方案。Furthermore, the execution device judges whether the seventh classification category is consistent with the eighth classification category through the feature processing network. In the case that the seventh classification category is consistent with the eighth classification category, the first processing result output by the feature processing network indicates that the first image is the original Image; In the case where the seventh classification category is inconsistent with the eighth classification category, the first processing result output by the feature processing network indicates that the first image is a disturbed image. Wherein, the first processing result may be specifically expressed in a text form, as an example, for example, the first processing result may be specifically expressed as a "natural image" or a "confrontational image". The first processing result can also be expressed in the form of characters. As an example, for example, the first processing result is specifically expressed as "0 0.3 1 0.7". In the aforementioned characters, 0 can refer to a natural image, and 1 can refer to a confrontation image, that is, a natural image has a probability of 0.3 and a probability of 0.7 is The confrontation image, so that the first processing result indicates that the first image is a confrontation image. As another example, for example, the first processing result is specifically expressed as "0.3 0.7", 0.3 indicates the probability that the first image is a natural image, 0.7 indicates the probability that the first image is an adversarial image, etc., so that the first processing result indicates the first image To counter the image. It should be understood that the example of the first processing result here is only to facilitate understanding of the solution, and is not used to limit the solution.
本申请实施例中,通过判断第七分类类别和第八分类类别是否一致,来确定第一图像 为原始图像还是对抗图像,方法简单,可操作性强。In the embodiment of the present application, by judging whether the seventh classification category and the eighth classification category are consistent, to determine whether the first image is the original image or the adversarial image, the method is simple and the operability is strong.
在另一种情况下,步骤804可以包括:将第三鲁棒表示和第三非鲁棒表示组合,并根据组合后的第五表示执行检测操作,以输出与第一图像对应的检测结果,检测结果为一种第一处理结果。其中,组合的方式以及组合后的第五表示的具体表现方式均可以参阅图3对应实施例中的描述。检测网络中可以包括至少一个感知机,感知机的含义可以参阅图3对应实施例中对步骤305的描述。检测结果的具体表现形式可以参阅上种情况中对第一处理结果的描述,此处均不做赘述。In another case, step 804 may include: combining the third robust representation and the third non-robust representation, and performing a detection operation according to the combined fifth representation to output a detection result corresponding to the first image, The detection result is a first processing result. For the combination mode and the specific expression mode of the fifth representation after the combination, please refer to the description in the embodiment corresponding to FIG. 3. The detection network may include at least one perceptron. For the meaning of the perceptron, refer to the description of step 305 in the embodiment corresponding to FIG. 3. For the specific manifestation of the detection result, please refer to the description of the first processing result in the previous case, which will not be repeated here.
本申请实施例中,提供了确定第一图像为原始图像还是对抗图像的另一种实现方式,增强了本方案的实现灵活性。In the embodiment of the present application, another implementation manner of determining whether the first image is the original image or the confrontation image is provided, which enhances the implementation flexibility of the solution.
本申请实施例中,不仅可以利用第一特征提取网络和第二提取网络提取到的特征信息,得到与图像中对象对应的处理结果,还可以得到与图像整体对应的处理结果,也即用于判断图像为原始图像还是扰动后的图像,扩展了本方案的应用场景。In the embodiments of the present application, not only can the feature information extracted by the first feature extraction network and the second extraction network be used to obtain the processing result corresponding to the object in the image, but also the processing result corresponding to the entire image can be obtained, that is, used for Judging whether the image is the original image or the disturbed image expands the application scenarios of this solution.
本申请实施例中,技术人员在研究过程中发现,只提取鲁棒表示,而舍弃了非鲁棒表示,导致神经网络处理原始图像时准确率的下降,而本申请实施例中,分别通过第一特征提取网络和第二特征提取网络提取输入图像中的鲁棒表示和非鲁棒表示,既避免了两者混杂从而导致鲁棒性的降低,也能够同时保留输入图像中的鲁棒表示和非鲁棒表示,从而避免了准确率的降低,以同时提升神经网络的鲁棒性和准确率。In the embodiments of this application, the technicians discovered in the research process that only the robust representation is extracted, and the non-robust representation is discarded, which leads to a decrease in the accuracy of the neural network when processing the original image. The first feature extraction network and the second feature extraction network extract the robust representation and non-robust representation in the input image, which not only avoids the mixing of the two and leads to the reduction of robustness, but also retains the robust representation and the non-robust representation in the input image. Non-robust representation, thereby avoiding the reduction of accuracy rate, and improving the robustness and accuracy of the neural network at the same time.
为了对本申请实施例所带来的有益效果有更为直观的认识,结合如下表格中的数据对本申请实施例所带来的有益效果作进一步的介绍。In order to have a more intuitive understanding of the beneficial effects brought about by the embodiments of the present application, the beneficial effects brought about by the embodiments of the present application are further introduced in combination with the data in the following table.
表1Table 1
 To SS RR NN
对抗训练Confrontation training 89.089.0 89.089.0 10.010.0
迭代优化Iterative optimization 86.886.8 79.979.9 81.981.9
本申请实施例Examples of this application 94.894.8 91.891.8 93.893.8
表1中以第一特征提取网络和第二特征提取网络均为WRNS34中的特征提取部分为例,其中,S指的是标准数据集,其中可以包括自然图像和对抗图像;R指的是对抗数据集,其中只包括对抗图像;N指的是自然数据集,其中只包括自然图像。对抗训练(adversarial training,AT)和迭代优化是目前的两种训练方式,通过表1示出的数据可以看出,在对S、R和N这三种数据集中的图像进行处理时,通过本申请实施例提供的训练方法得到的准确率均为最高的,也即本申请实施例提供了一种能够同时提升鲁棒性和准确率的训练方案。Table 1 takes the first feature extraction network and the second feature extraction network as the feature extraction part of WRNS34 as an example, where S refers to the standard data set, which can include natural images and confrontation images; R refers to confrontation Data set, which only includes confrontation images; N refers to the natural data set, which only includes natural images. Adversarial training (AT) and iterative optimization are the two current training methods. From the data shown in Table 1, it can be seen that when processing the images in the three data sets of S, R, and N, The training methods provided in the application embodiments all have the highest accuracy rates, that is, the embodiments of the present application provide a training scheme that can improve robustness and accuracy at the same time.
此外,我们还在自然样本和对抗样本的比例为一比一的数据集中进行试验,也即通过图8对应的图像处理网络预测输入的图像是自然图像还是对抗图像,并通过利用迭代优化的方式训练得到的图像处理网络预测输入图像是自然图像还是对抗图像,结果如下:In addition, we also conduct experiments in a data set with a ratio of natural samples to adversarial samples of one to one, that is, through the image processing network corresponding to Figure 8 to predict whether the input image is a natural image or an adversarial image, and through the use of iterative optimization The trained image processing network predicts whether the input image is a natural image or an adversarial image. The results are as follows:
表2Table 2
 To 迭代优化Iterative optimization 本申请实施例Examples of this application
检测精度Detection accuracy 4.94.9 64.864.8
表2中以第一特征提取网络和第二特征提取网络均为WRNS34中的特征提取部分为例, 检测精度(detection accuracy)指的是预测结果与实际情况相符的图像占总输入图像的比例,很明显,通过本申请实施例提供的训练方法得到的图像处理网络的。In Table 2, the first feature extraction network and the second feature extraction network are both feature extraction parts in WRNS34 as an example. Detection accuracy refers to the proportion of images whose prediction results match the actual situation in the total input images. Obviously, the image processing network is obtained through the training method provided in the embodiment of the application.
在图1至图8所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关设备。具体参阅图9,图9为本申请实施例提供的神经网络的训练装置的一种结构示意图。神经网络的训练装置900可以包括输入模块901和训练模块902。其中,输入模块901,用于将对抗图像分别输入第一特征提取网络和第二特征提取网络,得到第一特征提取网络生成的第一鲁棒表示和第二特征提取网络生成的第一非鲁棒表示,其中,对抗图像为对原始图像进行过扰动处理的图像,鲁棒表示指的是对扰动不敏感的特征,非鲁棒表示指的是对扰动敏感的特征。输入模块901,还用于将第一鲁棒表示输入分类网络,得到分类网络输出的第一分类类别,将第一非鲁棒表示输入分类网络,得到分类网络输出的第二分类类别。训练模块902,用于根据第一损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件,输出训练后的第一特征提取网络和训练后的第二特征提取网络。其中,第一损失函数用于表示第一类别和第一标注类别之间的相似度,且用于表示第二分类类别和第二标注类别之间的相似度,第一标注类别为与对抗图像对应的正确类别,第二标注类别为与对抗图像对应的错误类别。On the basis of the embodiments corresponding to FIGS. 1 to 8, in order to better implement the above solutions of the embodiments of the present application, the following also provides related equipment for implementing the above solutions. For details, refer to FIG. 9, which is a schematic structural diagram of a neural network training device provided by an embodiment of the application. The training device 900 of the neural network may include an input module 901 and a training module 902. Among them, the input module 901 is used to input the confrontation image into the first feature extraction network and the second feature extraction network to obtain the first robust representation generated by the first feature extraction network and the first non-robust representation generated by the second feature extraction network. Robust representation, where the counter image is an image that has been subjected to disturbance processing on the original image, robust representation refers to features that are not sensitive to disturbance, and non-robust representation refers to features sensitive to disturbance. The input module 901 is further configured to input the first robust representation into the classification network to obtain the first classification category output by the classification network, and input the first non-robust representation into the classification network to obtain the second classification category output by the classification network. The training module 902 is configured to iteratively train the first feature extraction network and the second feature extraction network according to the first loss function until the convergence condition is met, and output the trained first feature extraction network and the trained second feature extraction The internet. Among them, the first loss function is used to represent the similarity between the first category and the first label category, and is used to represent the similarity between the second classification category and the second label category, and the first label category is the same as the adversarial image Corresponding to the correct category, the second label category is the wrong category corresponding to the confrontation image.
本申请实施例中,技术人员在研究过程中发现,通过对抗训练使得神经网络只从输入图像中提取的鲁棒表示,而舍弃了非鲁棒表示,导致神经网络处理原始图像时准确率的下降,而本申请实施例中,分别通过第一特征提取网络和第二特征提取网络提取输入图像中的鲁棒表示和非鲁棒表示,既避免了两者混杂从而导致鲁棒性的降低,也能够同时保留输入图像中的鲁棒表示和非鲁棒表示,从而避免了准确率的降低,以同时提升神经网络的鲁棒性和准确率。In the embodiments of this application, the technicians discovered during the research process that through adversarial training, the neural network only extracts the robust representation from the input image, while discarding the non-robust representation, resulting in a decrease in the accuracy of the neural network when processing the original image. However, in the embodiment of the present application, the robust representation and the non-robust representation in the input image are extracted through the first feature extraction network and the second feature extraction network respectively, which not only avoids the mixing of the two and leads to a decrease in robustness, but also The robust representation and the non-robust representation in the input image can be retained at the same time, thereby avoiding the decrease in accuracy rate, and improving the robustness and accuracy of the neural network at the same time.
在一种可能的设计中,请参阅图10,图10为本申请实施例提供的神经网络的训练装置的一种结构示意图,输入模块901,还用于将原始图像分别输入第一特征提取网络和第二特征提取网络,得到第一特征提取网络生成的第二鲁棒表示和第二特征提取网络生成的第二非鲁棒表示。装置900还包括:组合模块903,用于将第二鲁棒表示和第二非鲁棒表示组合,得到组合后的第一表示。输入模块901,还用于将组合后的第一表示输入分类网络,以通过分类网络根据组合后的第一表示执行分类操作,得到分类网络输出的第三分类类别。训练模块902,具体用于根据第一损失函数和第二损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件,其中,第二损失函数用于表示第三分类类别与第三标注类别之间的相似度,第三标注类别为与原始图像对应的正确类别。In a possible design, please refer to FIG. 10. FIG. 10 is a schematic structural diagram of a neural network training device provided by an embodiment of this application. The input module 901 is also used to input the original image into the first feature extraction network. And a second feature extraction network to obtain a second robust representation generated by the first feature extraction network and a second non-robust representation generated by the second feature extraction network. The device 900 further includes: a combining module 903, configured to combine the second robust representation and the second non-robust representation to obtain the combined first representation. The input module 901 is further configured to input the combined first representation into the classification network to perform a classification operation according to the combined first representation through the classification network to obtain a third classification category output by the classification network. The training module 902 is specifically configured to perform iterative training on the first feature extraction network and the second feature extraction network according to the first loss function and the second loss function until the convergence condition is met, where the second loss function is used to represent the third The similarity between the classification category and the third annotation category, which is the correct category corresponding to the original image.
本申请实施例中,在训练过程中,训练模块902不仅利用对抗图像来训练第一特征提取网络和第二特征提取网络的特征提取能力,而且利用自然图像来训练第一特征提取网络和第二特征提取网络的特征提取能力,以进一步提高训练后的第一特征提取网络和训练后的第二特征提取网络在处理自然图像过程的准确度。In the embodiment of the present application, during the training process, the training module 902 not only uses confrontation images to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also uses natural images to train the first feature extraction network and the second feature extraction network. The feature extraction capabilities of the feature extraction network are used to further improve the accuracy of the trained first feature extraction network and the trained second feature extraction network in the process of processing natural images.
在一种可能的设计中,输入模块901,还用于将原始图像输入第一特征提取网络,得到第一特征提取网络生成的第二鲁棒表示。输入模块901,还用于将第二鲁棒表示输入分类网络,以通过分类网络根据第二鲁棒表示执行分类操作,得到分类网络输出的第四分类 类别。训练模块902,具体用于根据第一损失函数和第三损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件,其中,第三损失函数用于表示第四分类类别与第三标注类别之间的相似度,第三标注类别为与原始图像对应的正确类别。In a possible design, the input module 901 is also used to input the original image into the first feature extraction network to obtain the second robust representation generated by the first feature extraction network. The input module 901 is further configured to input the second robust representation into the classification network, so as to perform a classification operation according to the second robust representation through the classification network to obtain the fourth classification category output by the classification network. The training module 902 is specifically configured to perform iterative training on the first feature extraction network and the second feature extraction network according to the first loss function and the third loss function until the convergence condition is met, where the third loss function is used to represent the fourth The similarity between the classification category and the third annotation category, which is the correct category corresponding to the original image.
本申请实施例中,训练模块902不仅利用对抗图像来训练第一特征提取网络和第二特征提取网络的特征提取能力,而且利用自然图像训练第一特征提取网络对鲁棒表示的提取能力,以进一步提高训练后第一特征提取网络的准确率。In this embodiment of the application, the training module 902 not only uses confrontation images to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also uses natural images to train the robust representation extraction capabilities of the first feature extraction network to Further improve the accuracy of the first feature extraction network after training.
在一种可能的设计中,输入模块901,还用于将原始图像输入第二特征提取网络,得到第二特征提取网络生成的第二非鲁棒表示。输入模块901,还用于将第二非鲁棒表示输入分类网络,以通过分类网络根据第二非鲁棒表示执行分类操作,得到分类网络输出的第五分类类别。训练模块902,具体用于根据第一损失函数和第四损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件,其中,第四损失函数用于表示第五分类类别与第三标注类别之间的相似度,第三标注类别为与原始图像对应的正确类别。In a possible design, the input module 901 is also used to input the original image into the second feature extraction network to obtain a second non-robust representation generated by the second feature extraction network. The input module 901 is further configured to input the second non-robust representation into the classification network to perform a classification operation according to the second non-robust representation through the classification network to obtain the fifth classification category output by the classification network. The training module 902 is specifically configured to perform iterative training on the first feature extraction network and the second feature extraction network according to the first loss function and the fourth loss function until the convergence condition is met, where the fourth loss function is used to represent the fifth The similarity between the classification category and the third annotation category, which is the correct category corresponding to the original image.
本申请实施例中,训练模块902不仅利用对抗图像来训练第一特征提取网络和第二特征提取网络的特征提取能力,而且利用自然图像训练第二特征提取网络对非鲁棒表示的提取能力,以进一步提高训练后第二特征提取网络的准确率。In the embodiment of the present application, the training module 902 not only uses confrontation images to train the feature extraction capabilities of the first feature extraction network and the second feature extraction network, but also uses natural images to train the second feature extraction network's extraction capabilities for non-robust representations. In order to further improve the accuracy of the second feature extraction network after training.
在一种可能的设计中,输入模块901,还用于将原始图像分别输入第一特征提取网络和第二特征提取网络,得到第一特征提取网络生成的第二鲁棒表示和第二特征提取网络生成的第二非鲁棒表示。输入模块901,还用于将第二鲁棒表示和第二鲁棒表示组合,得到组合后的第一表示,将组合后的第一表示输入分类网络,以通过分类网络根据组合后的第一表示执行分类操作,得到分类网络输出的第三分类类别。输入模块901,还用于将第二鲁棒表示输入分类网络,以通过分类网络根据第二鲁棒表示执行分类操作,得到分类网络输出的第四分类类别。输入模块901,还用于将第二非鲁棒表示输入分类网络,以通过分类网络根据第二非鲁棒表示执行分类操作,得到分类网络输出的第五分类类别。训练模块902,具体用于根据第一损失函数和第五损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件,其中,第五损失函数用于表示第四分类类别与第三标注类别之间的相似度,且用于表示第五分类类别与第三标注类别之间的相似度,且用于表示第六分类类别与第三标注类别之间的相似度,第三标注类别为与原始图像对应的正确类别。In a possible design, the input module 901 is also used to input the original image into the first feature extraction network and the second feature extraction network, respectively, to obtain the second robust representation and the second feature extraction generated by the first feature extraction network The second non-robust representation generated by the network. The input module 901 is also used to combine the second robust representation and the second robust representation to obtain a combined first representation, and input the combined first representation into the classification network to pass the classification network according to the combined first representation Indicates that the classification operation is performed to obtain the third classification category output by the classification network. The input module 901 is further configured to input the second robust representation into the classification network, so as to perform a classification operation according to the second robust representation through the classification network to obtain the fourth classification category output by the classification network. The input module 901 is further configured to input the second non-robust representation into the classification network to perform a classification operation according to the second non-robust representation through the classification network to obtain the fifth classification category output by the classification network. The training module 902 is specifically configured to perform iterative training on the first feature extraction network and the second feature extraction network according to the first loss function and the fifth loss function until the convergence condition is met, where the fifth loss function is used to represent the fourth The similarity between the classification category and the third annotation category, and is used to indicate the similarity between the fifth classification category and the third annotation category, and the similarity between the sixth classification category and the third annotation category , The third label category is the correct category corresponding to the original image.
本申请实施例中,在提高第一特征提取网络和第二特征提取网络对对抗图像的处理能力的同时,也提高第一特征提取网络和第二特征提取网络对自然图像的处理能力,也即无论是自然图像还是对抗图像,训练后的第一特征提取网络和第二特征提取网络均能准确的提取出鲁棒表示和非鲁棒表示,扩展了本方案的应用场景。In the embodiments of the present application, while improving the processing capabilities of the first feature extraction network and the second feature extraction network on confrontation images, the processing capabilities of the first feature extraction network and the second feature extraction network on natural images are also improved, that is, Whether it is a natural image or an adversarial image, the trained first feature extraction network and the second feature extraction network can accurately extract robust and non-robust representations, which expands the application scenarios of this solution.
在一种可能的设计中,请参阅图10,装置还包括生成模块904,具体用于:根据第二损失函数的函数值,生成第一梯度;根据第一梯度对原始图像进行扰动处理,以生成对抗图像,将第三标注类别确定为第一标注类别。In a possible design, referring to FIG. 10, the device further includes a generating module 904, which is specifically configured to: generate a first gradient according to the function value of the second loss function; and perform perturbation processing on the original image according to the first gradient to A confrontation image is generated, and the third label category is determined as the first label category.
本申请实施例中,生成模块904根据第三分类类别与第三标注类别之间的相似度,生 成第一梯度,并根据第一梯度对原始图像进行扰动,使得扰动处理更具有针对性,有利于加快第一特征提取网络和第二特征提取网络的训练过程,提高训练过程的效率。In the embodiment of the present application, the generation module 904 generates the first gradient according to the similarity between the third classification category and the third annotation category, and perturbs the original image according to the first gradient, so that the disturbance processing is more targeted. It is helpful to speed up the training process of the first feature extraction network and the second feature extraction network, and improve the efficiency of the training process.
在一种可能的设计中,请参阅图10,装置还包括生成模块904,具体用于:根据第三损失函数的函数值,生成第二梯度;根据第二梯度对原始图像进行扰动处理,以生成对抗图像,将第三标注类别确定为第一标注类别。In a possible design, referring to FIG. 10, the device further includes a generating module 904, specifically configured to: generate a second gradient according to the function value of the third loss function; perform perturbation processing on the original image according to the second gradient to A confrontation image is generated, and the third label category is determined as the first label category.
本申请实施例中,生成模块904根据分类网络根据第二鲁棒表示输出的第四分类类别与第三标注类别之间的相似度,对原始图像进行扰动,使得扰动处理与第一特征提取网络之间更具有针对性,有利于提高第一特征提取网络对鲁棒表示的特征提取能力。In the embodiment of the present application, the generation module 904 perturbs the original image according to the similarity between the fourth classification category output by the second robust representation and the third annotation category according to the classification network, so that the perturbation processing is the same as the first feature extraction network. The relationship is more pertinent, which is beneficial to improve the feature extraction ability of the first feature extraction network for robust representation.
在一种可能的设计中,请参阅图10,装置还包括生成模块904,具体用于:根据第四损失函数的函数值,生成第三梯度;根据第三梯度对原始图像进行扰动处理,以生成对抗图像,将第三标注类别确定为第一标注类别。In a possible design, referring to FIG. 10, the device further includes a generating module 904, specifically configured to: generate a third gradient according to the function value of the fourth loss function; and perform perturbation processing on the original image according to the third gradient to A confrontation image is generated, and the third label category is determined as the first label category.
本申请实施例中,生成模块904根据分类网络根据第二非鲁棒表示输出的第五分类类别与第三标注类别之间的相似度,对原始图像进行扰动,使得扰动处理与第二特征提取网络之间更具有针对性,有利于提高第一特征提取网络对非鲁棒表示的特征提取能力。In the embodiment of the present application, the generation module 904 perturbs the original image according to the similarity between the fifth classification category output by the classification network according to the second non-robust representation and the third annotation category, so that the perturbation processing and the second feature extraction The networks are more pertinent, which helps to improve the feature extraction ability of the first feature extraction network for non-robust representations.
在一种可能的设计中,请参阅图10,装置900还包括:组合模块903,用于将第一鲁棒表示和第一非鲁棒表示组合,得到组合后的第二表示。输入模块901,还用于将组合后的第二表示输入分类网络,得到分类网络输出的第六分类类别。输入模块901,具体用于在第六分类类别与第一标注类别不同的情况下,将第一鲁棒表示输入分类网络,得到分类网络输出的第一分类类别,将第一非鲁棒表示输入分类网络,得到分类网络输出的第二分类类别。In a possible design, referring to FIG. 10, the device 900 further includes: a combining module 903, configured to combine the first robust representation and the first non-robust representation to obtain a combined second representation. The input module 901 is also used to input the combined second representation into the classification network to obtain the sixth classification category output by the classification network. The input module 901 is specifically configured to input the first robust representation into the classification network to obtain the first classification category output by the classification network, and input the first non-robust representation into the classification network when the sixth classification category is different from the first label category. The classification network obtains the second classification category output by the classification network.
本申请实施例中,若第六分类类别与第一标注类别相同,则证明该扰动后图像的扰动过于轻微,则对于第一特征提取网络和第二特征提取网络而言,处理方式与对自然图像的处理方式相差不大,而此处训练的目的是增强第一特征提取网络和第二特征提取网络从扰动较大的图像中分离鲁棒表示和非鲁棒表示的能力,仅在第六分类类别与第一标注类别不同的情况下,才做后续的训练操作,以提高训练过程的效率。In the embodiment of this application, if the sixth classification category is the same as the first annotation category, it is proved that the disturbance of the image after the disturbance is too slight. For the first feature extraction network and the second feature extraction network, the processing method is the same as that of the natural The image processing methods are not much different, and the purpose of training here is to enhance the ability of the first feature extraction network and the second feature extraction network to separate robust and non-robust representations from images with larger disturbances. Only in the sixth When the classification category is different from the first label category, subsequent training operations are performed to improve the efficiency of the training process.
在一种可能的设计中,请参阅图10,装置900还包括:确定模块905,用于在第六分类类别与第一标注类别不同的情况下,将第六分类类别确定为第二标注类别。本申请实施例中,提供了第二标注类别的一种获取方式,操作简单,且不需要增加额外步骤,节省了计算机资源。In a possible design, referring to FIG. 10, the device 900 further includes: a determining module 905, configured to determine the sixth classification category as the second label category when the sixth classification category is different from the first label category . In the embodiment of the present application, a method for obtaining the second label category is provided, which is simple to operate and does not require additional steps, which saves computer resources.
在一种可能的设计中,第一特征提取网络为卷积神经网络或者残差神经网络,第二特征提取网络为卷积神经网络或残差神经网络。本申请实施例中,提供了第一特征提取网络和第二特征提取网络的两种具体实现方式,提高了本方案的实现灵活性。In a possible design, the first feature extraction network is a convolutional neural network or a residual neural network, and the second feature extraction network is a convolutional neural network or a residual neural network. In the embodiment of the present application, two specific implementation manners of the first feature extraction network and the second feature extraction network are provided, which improves the implementation flexibility of the solution.
需要说明的是,神经网络的训练装置900中各模块/单元之间的信息交互、执行过程等内容,与本申请中图3至图5对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。It should be noted that the information interaction and execution process among the various modules/units in the neural network training device 900 are based on the same concept as the method embodiments corresponding to FIGS. 3 to 5 in this application. For details, please refer to The descriptions in the foregoing method embodiments of this application will not be repeated here.
本申请实施例还提供一种图像处理网络,具体参阅图11,图11为本申请实施例提供的神经网络的训练装置的一种结构示意图。图像处理网络1100包括第一特征提取网络1101、 第二特征提取网络1102和特征处理网络1103,其中,第一特征提取网络1101,用于接收输入的第一图像,生成与第一图像对应的鲁棒表示,鲁棒表示指的是对扰动不敏感的特征。第二特征提取网络1102,用于接收输入的第一图像,生成与第一图像对应的非鲁棒表示,非鲁棒表示指的是对扰动敏感的特征。特征处理网络1103,用于获取鲁棒表示和非鲁棒表示,以输出与第一图像对应的第一处理结果。An embodiment of the application also provides an image processing network. For details, refer to FIG. 11, which is a schematic structural diagram of a neural network training device provided by an embodiment of the application. The image processing network 1100 includes a first feature extraction network 1101, a second feature extraction network 1102, and a feature processing network 1103. The first feature extraction network 1101 is configured to receive a first image input and generate a lug corresponding to the first image. Robust representation, robust representation refers to features that are not sensitive to disturbances. The second feature extraction network 1102 is configured to receive the input first image, and generate a non-robust representation corresponding to the first image. The non-robust representation refers to a feature that is sensitive to disturbances. The feature processing network 1103 is used to obtain a robust representation and a non-robust representation to output the first processing result corresponding to the first image.
本申请实施例中,分别通过第一特征提取网络1102和第二特征提取网络1103提取输入图像中的鲁棒表示和非鲁棒表示,既避免了两者混杂从而导致鲁棒性的降低,也能够同时保留输入图像中的鲁棒表示和非鲁棒表示,从而避免了准确率的降低,以同时提升神经网络的鲁棒性和准确率。In the embodiment of the present application, the first feature extraction network 1102 and the second feature extraction network 1103 are respectively used to extract the robust representation and the non-robust representation in the input image, which not only avoids the mixing of the two and reduces the robustness, but also The robust representation and the non-robust representation in the input image can be retained at the same time, thereby avoiding the decrease in accuracy rate, and improving the robustness and accuracy of the neural network at the same time.
在一种可能的设计中,特征处理网络1103,具体用于:将鲁棒表示和非鲁棒表示组合,根据组合后的表示,输出与第一图像对应的第一处理结果;或者,根据鲁棒表示,输出与第一图像对应的第一处理结果,第一情况和第二情况为不同的情况;或者,根据非鲁棒表示,输出与第一图像对应的第一处理结果。本申请实施例中,图像处理网络1100中同时包括鲁棒路径和标准路径,用户可以根据实际情况灵活选择使用哪一条路径,扩展了本方案的应用场景,提高了本方案的实现灵活性。In a possible design, the feature processing network 1103 is specifically used to: combine the robust representation and the non-robust representation, and according to the combined representation, output the first processing result corresponding to the first image; or, according to the robust The stick means that the first processing result corresponding to the first image is output, and the first case and the second case are different cases; or, according to the non-robust representation, the first processing result corresponding to the first image is output. In the embodiment of the present application, the image processing network 1100 includes both a robust path and a standard path, and the user can flexibly choose which path to use according to the actual situation, which expands the application scenarios of this solution and improves the implementation flexibility of this solution.
在一种可能的设计中,特征处理网络1103,具体用于:根据组合后的表示执行分类操作,输出与第一图像对应的分类类别;或者,根据鲁棒表示执行分类操作,输出与第一图像对应的分类类别;或者,根据非鲁棒表示执行分类操作,输出与第一图像对应的分类类别。本申请实施例中,将提供的图像处理网络1100落到图像分类这一具体应用场景中,提高了与应用场景的结合程度。In a possible design, the feature processing network 1103 is specifically used to: perform a classification operation according to the combined representation and output a classification category corresponding to the first image; or, perform a classification operation according to a robust representation, and output a classification operation corresponding to the first image. The classification category corresponding to the image; or, the classification operation is performed according to the non-robust representation, and the classification category corresponding to the first image is output. In the embodiment of the present application, the provided image processing network 1100 falls into the specific application scenario of image classification, which improves the degree of integration with the application scenario.
在一种可能的设计中,第一处理结果指示第一图像为原始图像,或者,第一处理结果指示第一图像为扰动后的图像。本申请实施例中,不仅可以利用第一特征提取网络1101和第二提取网络1102提取到的特征信息,得到与图像中对象对应的处理结果,还可以得到与图像整体对应的处理结果,也即用于判断图像为原始图像还是扰动后的图像,扩展了本方案的应用场景。In a possible design, the first processing result indicates that the first image is an original image, or the first processing result indicates that the first image is a disturbed image. In the embodiment of the present application, not only the feature information extracted by the first feature extraction network 1101 and the second extraction network 1102 can be used to obtain the processing result corresponding to the object in the image, but also the processing result corresponding to the entire image can be obtained, that is, It is used to determine whether the image is the original image or the disturbed image, which expands the application scenarios of this solution.
在一种可能的设计中,特征处理网络1103,具体用于:根据鲁棒表示,确定与第一图像对应的第一分类类别;根据非鲁棒表示,确定与第一图像对应的第二分类类别;在第一分类类别与第二分类类别一致的情况下,输出的第一处理结果指示第一图像为原始图像;在第一分类类别与第二分类类别不一致的情况下,输出的第一处理结果指示第一图像为扰动后的图像。In a possible design, the feature processing network 1103 is specifically used to: determine the first classification category corresponding to the first image according to the robust representation; determine the second classification corresponding to the first image according to the non-robust representation Category; in the case that the first classification category is consistent with the second classification category, the output first processing result indicates that the first image is the original image; in the case where the first classification category is inconsistent with the second classification category, the output first The processing result indicates that the first image is a disturbed image.
本申请实施例中,特征处理网络1103通过判断第七分类类别和第八分类类别是否一致,来确定第一图像为原始图像还是对抗图像,方法简单,可操作性强。In the embodiment of the present application, the feature processing network 1103 determines whether the first image is the original image or the adversarial image by judging whether the seventh classification category and the eighth classification category are consistent. The method is simple and has strong operability.
在一种可能的设计中,特征处理网络1103,具体用于将鲁棒表示和非鲁棒表示组合,并根据组合后的表示执行检测操作,以输出与第一图像对应的检测结果,第一处理结果包括检测结果。本申请实施例中,提供了确定第一图像为原始图像还是对抗图像的另一种实现方式,增强了本方案的实现灵活性。In a possible design, the feature processing network 1103 is specifically used to combine the robust representation and the non-robust representation, and perform detection operations according to the combined representation to output the detection result corresponding to the first image. The processing result includes the test result. In the embodiment of the present application, another implementation manner of determining whether the first image is the original image or the confrontation image is provided, which enhances the implementation flexibility of the solution.
在一种可能的设计中,图像处理网络1100为以下中的一项或多项:图像分类网络、图 像识别网络、图像分割网络或图像检测网络。本申请实施例中,提供了图像处理网络1100的多种具体实现方式,扩展了本方案的应用场景,提高了本方案的实现灵活性。In a possible design, the image processing network 1100 is one or more of the following: an image classification network, an image recognition network, an image segmentation network, or an image detection network. In the embodiments of the present application, a variety of specific implementation manners of the image processing network 1100 are provided, which expands the application scenarios of this solution and improves the implementation flexibility of this solution.
在一种可能的设计中,特征处理网络1103包括感知机。In one possible design, the feature processing network 1103 includes a perceptron.
在一种可能的设计中,第一特征提取网络1101为卷积神经网络或残差神经网络,第二特征提取网络1102为卷积神经网络或残差神经网络。In a possible design, the first feature extraction network 1101 is a convolutional neural network or a residual neural network, and the second feature extraction network 1102 is a convolutional neural network or a residual neural network.
本申请实施例还提供了一种执行设备,请参阅图12,图12为本申请实施例提供的执行设备的一种结构示意图,执行设备1200具体可以表现为手机、电脑、可穿戴设备、自动驾驶车辆、智能家电、芯片或其他状态等,此处不做限定。其中,执行设备1200上可以部署有图11对应实施例中所描述的图像处理网络1100,用于实现图6至图8对应实施例中执行设备的功能。执行设备1200包括:接收器1201、发射器1202、处理器1203和存储器1204(其中数据生成装置1200中的处理器1203的数量可以一个或多个,图12中以一个处理器为例),其中,处理器1203可以包括应用处理器12031和通信处理器12032。在本申请实施例的一些实施例中,接收器1201、发射器1202、处理器1203和存储器1204可通过总线或其它方式连接。The embodiment of the present application also provides an execution device. Please refer to FIG. 12. FIG. 12 is a schematic structural diagram of the execution device provided by the embodiment of the application. Driving vehicles, smart home appliances, chips, or other states, etc., are not limited here. The image processing network 1100 described in the embodiment corresponding to FIG. 11 may be deployed on the execution device 1200 to implement the functions of the execution device in the embodiment corresponding to FIG. 6 to FIG. 8. The execution device 1200 includes: a receiver 1201, a transmitter 1202, a processor 1203, and a memory 1204 (the number of processors 1203 in the data generating apparatus 1200 may be one or more, and one processor is taken as an example in FIG. 12), where The processor 1203 may include an application processor 12031 and a communication processor 12032. In some embodiments of the embodiments of the present application, the receiver 1201, the transmitter 1202, the processor 1203, and the memory 1204 may be connected by a bus or other methods.
存储器1204可以包括只读存储器和随机存取存储器,并向处理器1203提供指令和数据。存储器1204的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1204存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。The memory 1204 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1203. A part of the memory 1204 may also include a non-volatile random access memory (NVRAM). The memory 1204 stores a processor and operating instructions, executable modules or data structures, or a subset of them, or an extended set of them. The operating instructions may include various operating instructions for implementing various operations.
处理器1203控制数据生成装置的操作。具体的应用中,数据生成装置的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。The processor 1203 controls the operation of the data generating device. In a specific application, the various components of the data generating device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. However, for the sake of clear description, various buses are referred to as bus systems in the figure.
上述本申请实施例揭示的方法可以应用于处理器1203中,或者由处理器1203实现。处理器1203可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1203中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1203可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1203可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1204,处理器1203读取存储器1204中的信息,结合其硬件完成上述方法的步骤。The methods disclosed in the above embodiments of the present application may be applied to the processor 1203 or implemented by the processor 1203. The processor 1203 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 1203 or instructions in the form of software. The above-mentioned processor 1203 may be a general-purpose processor, a digital signal processing (digital signal processing, DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The processor 1203 can implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory 1204, and the processor 1203 reads the information in the memory 1204, and completes the steps of the foregoing method in combination with its hardware.
接收器1201可用于接收输入的数字或字符信息,以及产生与数据生成装置的相关设置以及功能控制有关的信号输入。发射器1202可用于通过第一接口输出数字或字符信息,发 射器1202还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据,发射器1202还可以包括显示屏等显示设备。The receiver 1201 can be used to receive input digital or character information, and to generate signal input related to the related settings and function control of the data generating device. The transmitter 1202 can be used to output digital or character information through the first interface. The transmitter 1202 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group. The transmitter 1202 can also include display devices such as a display screen. .
本申请实施例中,处理器1203,用于执行图6至图8对应实施例中执行设备执行的图像处理方法。具体的,应用处理器12031用于执行如下步骤:将第一图像输入第一特征提取网络,得到第一特征提取网络生成的与第一图像对应的鲁棒表示,鲁棒表示指的是对扰动不敏感的特征;将第一图像输入第二特征提取网络,得到第二特征提取网络生成的与第一图像对应的非鲁棒表示,非鲁棒表示指的是对扰动敏感的特征;通过特征处理网络,根据鲁棒表示和非鲁棒表示,输出与第一图像对应的第一处理结果,第一特征提取网络、第二特征提取网络和特征处理网络归属于同一图像处理网络。In the embodiment of the present application, the processor 1203 is configured to execute the image processing method executed by the execution device in the embodiment corresponding to FIG. 6 to FIG. 8. Specifically, the application processor 12031 is configured to perform the following steps: input the first image into the first feature extraction network to obtain a robust representation corresponding to the first image generated by the first feature extraction network, and the robust representation refers to the disturbance Insensitive features; the first image is input to the second feature extraction network, and the non-robust representation corresponding to the first image generated by the second feature extraction network is obtained. The non-robust representation refers to the features that are sensitive to disturbance; through the feature The processing network outputs a first processing result corresponding to the first image according to the robust representation and the non-robust representation, and the first feature extraction network, the second feature extraction network, and the feature processing network belong to the same image processing network.
需要说明的是,应用处理器12031还用于执行图6至图8对应方法实施例中执行设备执行的其他步骤,对于应用处理器12031执行图像处理方法的具体实现方式以及带来的有益效果,均可以参考图2至图8对应的各个方法实施例中的叙述,此处不再一一赘述。It should be noted that the application processor 12031 is also used to execute other steps executed by the execution device in the method embodiments corresponding to FIG. 6 to FIG. You can refer to the descriptions in the respective method embodiments corresponding to FIG. 2 to FIG. 8, which will not be repeated here.
本申请实施例还提供一种训练设备,请参阅图13,图13为本申请实施例提供的训练设备的一种结构示意图。训练设备1300上可以部署有图9和图10对应实施例中所描述的训练设备900,用于实现图3和图5对应实施例中训练设备的功能,具体的,训练设备1300由一个或多个服务器实现,训练设备1300可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1322(例如,一个或一个以上处理器)和存储器1332,一个或一个以上存储应用程序1342或数据1344的存储介质1330(例如一个或一个以上海量存储设备)。其中,存储器1332和存储介质1330可以是短暂存储或持久存储。存储在存储介质1330的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对训练设备中的一系列指令操作。更进一步地,中央处理器1322可以设置为与存储介质1330通信,在训练设备1300上执行存储介质1330中的一系列指令操作。The embodiment of the present application also provides a training device. Please refer to FIG. 13, which is a schematic structural diagram of the training device provided by the embodiment of the present application. The training device 900 described in the embodiment corresponding to FIG. 9 and FIG. 10 may be deployed on the training device 1300 to realize the function of the training device in the embodiment corresponding to FIG. 3 and FIG. 5. Specifically, the training device 1300 consists of one or more Implementation of a single server, the training device 1300 may have relatively large differences due to different configurations or performance, and may include one or more central processing units (CPU) 1322 (for example, one or more processors) and a memory 1332 , One or more storage media 1330 for storing application programs 1342 or data 1344 (for example, one or one storage device with a large amount of storage). Among them, the memory 1332 and the storage medium 1330 may be short-term storage or persistent storage. The program stored in the storage medium 1330 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the training device. Furthermore, the central processing unit 1322 may be configured to communicate with the storage medium 1330, and execute a series of instruction operations in the storage medium 1330 on the training device 1300.
训练设备1300还可以包括一个或一个以上电源1326,一个或一个以上有线或无线网络接口1350,一个或一个以上输入输出接口1358,和/或,一个或一个以上操作系统1341,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。The training device 1300 may also include one or more power supplies 1326, one or more wired or wireless network interfaces 1350, one or more input and output interfaces 1358, and/or one or more operating systems 1341, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
本申请实施例中,中央处理器1322,用于执行图3对应实施例中的训练设备执行的图像处理方法。具体的,中央处理器1322,用于将对抗图像分别输入第一特征提取网络和第二特征提取网络,得到第一特征提取网络生成的第一鲁棒表示和第二特征提取网络生成的第一非鲁棒表示,其中,对抗图像为进行过扰动处理的图像,鲁棒表示指的是对扰动不敏感的特征,非鲁棒表示指的是对扰动敏感的特征。将第一鲁棒表示输入分类网络,得到分类网络输出的第一分类类别,将第一非鲁棒表示输入分类网络,得到分类网络输出的第二分类类别;根据第一损失函数,对第一特征提取网络和第二特征提取网络进行迭代训练,直至满足收敛条件,输出训练后的第一特征提取网络和训练后的第二特征提取网络。其中,第一损失函数用于表示第一类别和第一标注类别之间的相似度,且用于表示第二分类类别和第二标注类别之间的相似度,第一标注类别为与对抗图像对应的正确类别,第二标注类别为与对抗图像对应的错误类别。In the embodiment of the present application, the central processing unit 1322 is configured to execute the image processing method executed by the training device in the embodiment corresponding to FIG. 3. Specifically, the central processing unit 1322 is configured to input the confrontation image into the first feature extraction network and the second feature extraction network to obtain the first robust representation generated by the first feature extraction network and the first robust representation generated by the second feature extraction network. Non-robust representation, where the counter image is an image that has undergone disturbance processing. Robust representation refers to features that are not sensitive to disturbance, and non-robust representation refers to features that are sensitive to disturbance. The first robust is represented by the input classification network to obtain the first classification category output by the classification network, and the first non-robust is represented by the input classification network to obtain the second classification category output by the classification network; according to the first loss function, the first classification The feature extraction network and the second feature extraction network are iteratively trained until the convergence condition is met, and the trained first feature extraction network and the trained second feature extraction network are output. Among them, the first loss function is used to represent the similarity between the first category and the first label category, and is used to represent the similarity between the second classification category and the second label category, and the first label category is the same as the adversarial image Corresponding to the correct category, the second label category is the wrong category corresponding to the confrontation image.
需要说明的是,中央处理器1322还用于执行图3对应实施例中执行设备执行的其他步骤,对于中央处理器1322执行神经网络的训练方法的具体实现方式以及带来的有益效果,均可以参考图3对应的各个方法实施例中的叙述,此处不再一一赘述。It should be noted that the central processing unit 1322 is also used to execute other steps executed by the execution device in the embodiment corresponding to FIG. Refer to the descriptions in the respective method embodiments corresponding to FIG. 3, which will not be repeated here.
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有程序,当其在计算机上运行时,使得计算机执行如前述图3至图5所示实施例描述的方法中训练设备所执行的步骤。The embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a program. When it runs on a computer, the computer executes the steps described in the embodiments shown in FIGS. 3 to 5 above. The steps performed by the training device in the method.
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有程序,当其在计算机上运行时,使得计算机执行如前述图6至图8所示实施例描述的方法中执行设备所执行的步骤。The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a program, and when it runs on a computer, the computer executes the steps described in the foregoing embodiments shown in FIGS. 6 to 8 The steps performed by the device in the method.
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上行驶时,使得计算机执行如前述图3至图5所示实施例描述的方法中训练设备所执行的步骤,或者,使得计算机执行如前述图6至图8所示实施例描述的方法中执行设备所执行的步骤。The embodiment of the present application also provides a product including a computer program, which when it is driven on a computer, causes the computer to execute the steps performed by the training device in the method described in the embodiments shown in FIGS. 3 to 5, or, The computer executes the steps executed by the execution device in the method described in the foregoing embodiments shown in FIG. 6 to FIG. 8.
本申请实施例中还提供一种电路系统,所述电路系统包括处理电路,所述处理电路配置为执行如前述图3至图5所示实施例描述的方法中训练设备所执行的步骤,或者,所述处理电路配置为执行如前述图6至图8所示实施例描述的方法中执行设备所执行的步骤。An embodiment of the present application also provides a circuit system, the circuit system includes a processing circuit configured to perform the steps performed by the training device in the method described in the embodiments shown in FIGS. 3 to 5, or The processing circuit is configured to execute the steps performed by the execution device in the method described in the embodiments shown in FIG. 6 to FIG. 8.
本申请实施例提供的神经网络的训练装置或执行设备具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使训练设备内的芯片执行上述图3至图5所示实施例描述的神经网络的训练方法,或者,以使执行设备内的芯片执行上述图6至图8所示实施例描述的图像处理方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The training device or execution device of the neural network provided in the embodiment of the present application may specifically be a chip. The chip includes a processing unit and a communication unit. The processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, Pins or circuits, etc. The processing unit can execute the computer execution instructions stored in the storage unit, so that the chip in the training device executes the neural network training method described in the embodiments shown in FIGS. 3 to 5, or so that the chip in the execution device executes the above The image processing method described in the embodiments shown in FIGS. 6 to 8. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
具体的,请参阅图14,图14为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 140,NPU 140作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1403,通过控制器1404控制运算电路1403提取存储器中的矩阵数据并进行乘法运算。Specifically, please refer to FIG. 14. FIG. 14 is a schematic diagram of a structure of a chip provided by an embodiment of the application. The chip may be expressed as a neural network processor NPU 140, which is mounted as a coprocessor to the main CPU (Host On the CPU), the Host CPU assigns tasks. The core part of the NPU is the arithmetic circuit 1403, and the controller 1404 controls the arithmetic circuit 1403 to extract matrix data from the memory and perform multiplication operations.
在一些实现中,运算电路1403内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路1403是二维脉动阵列。运算电路1403还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1403是通用的矩阵处理器。In some implementations, the arithmetic circuit 1403 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 1403 is a two-dimensional systolic array. The arithmetic circuit 1403 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1403 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1402中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1401中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1408中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the corresponding data of matrix B from the weight memory 1402 and caches it on each PE in the arithmetic circuit. The arithmetic circuit fetches the matrix A data and matrix B from the input memory 1401 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 1408.
统一存储器1406用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控 制器(Direct Memory Access Controller,DMAC)1405,DMAC被搬运到权重存储器1402中。输入数据也通过DMAC被搬运到统一存储器1406中。The unified memory 1406 is used to store input data and output data. The weight data directly passes through the memory unit access controller (Direct Memory Access Controller, DMAC) 1405, and the DMAC is transferred to the weight memory 1402. The input data is also transferred to the unified memory 1406 through the DMAC.
BIU为Bus Interface Unit即,总线接口单元1410,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)1409的交互。The BIU is the Bus Interface Unit, that is, the bus interface unit 1410, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (IFB) 1409.
总线接口单元1410(Bus Interface Unit,简称BIU),用于取指存储器1409从外部存储器获取指令,还用于存储单元访问控制器1405从外部存储器获取输入矩阵A或者权重矩阵B的原数据。The bus interface unit 1410 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1409 to obtain instructions from the external memory, and is also used for the storage unit access controller 1405 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1406或将权重数据搬运到权重存储器1402中或将输入数据数据搬运到输入存储器1401中。The DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1406 or to transfer the weight data to the weight memory 1402 or to transfer the input data to the input memory 1401.
向量计算单元1407包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。The vector calculation unit 1407 includes multiple arithmetic processing units, and if necessary, further processes the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. It is mainly used in the calculation of non-convolutional/fully connected layer networks in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.
在一些实现中,向量计算单元1407能将经处理的输出的向量存储到统一存储器1406。例如,向量计算单元1407可以将线性函数和/或非线性函数应用到运算电路1403的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1407生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1403的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, the vector calculation unit 1407 can store the processed output vector to the unified memory 1406. For example, the vector calculation unit 1407 may apply a linear function and/or a non-linear function to the output of the arithmetic circuit 1403, such as linearly interpolating the feature plane extracted by the convolutional layer, and for example a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 1407 generates normalized values, pixel-level summed values, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 1403, for example for use in a subsequent layer in a neural network.
控制器1404连接的取指存储器(instruction fetch buffer)1409,用于存储控制器1404使用的指令;The instruction fetch buffer 1409 connected to the controller 1404 is used to store instructions used by the controller 1404;
统一存储器1406,输入存储器1401,权重存储器1402以及取指存储器1409均为On-Chip存储器。外部存储器私有于该NPU硬件架构。The unified memory 1406, the input memory 1401, the weight memory 1402, and the fetch memory 1409 are all On-Chip memories. The external memory is private to the NPU hardware architecture.
其中,循环神经网络中各层的运算可以由运算电路1403或向量计算单元1407执行。Among them, the calculation of each layer in the recurrent neural network can be executed by the arithmetic circuit 1403 or the vector calculation unit 1407.
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面方法的程序执行的集成电路。Wherein, the processor mentioned in any of the foregoing may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the method in the first aspect.
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。In addition, it should be noted that the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate. The physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that they have a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CLU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是 更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CLUs, dedicated memories, Dedicated components and so on to achieve. Under normal circumstances, all functions completed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to achieve the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. Circuit etc. However, for this application, software program implementation is a better implementation in more cases. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, server, or network device, etc.) execute the methods described in each embodiment of this application .
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Claims (44)

  1. 一种神经网络的训练方法,其特征在于,所述方法包括:A neural network training method, characterized in that the method includes:
    将对抗图像分别输入第一特征提取网络和第二特征提取网络,得到所述第一特征提取网络生成的第一鲁棒表示和所述第二特征提取网络生成的第一非鲁棒表示,其中,所述对抗图像为对原始图像进行过扰动处理的图像,鲁棒表示指的是对扰动不敏感的特征,非鲁棒表示指的是对扰动敏感的特征;The confrontation image is input into the first feature extraction network and the second feature extraction network, respectively, to obtain a first robust representation generated by the first feature extraction network and a first non-robust representation generated by the second feature extraction network, where , The counter image is an image that has undergone disturbance processing on the original image, the robust representation refers to features that are not sensitive to disturbance, and the non-robust representation refers to features that are sensitive to disturbance;
    将所述第一鲁棒表示输入分类网络,得到分类网络输出的第一分类类别,将所述第一非鲁棒表示输入分类网络,得到分类网络输出的第二分类类别;Inputting the first robust representation to the classification network to obtain a first classification category output by the classification network, and inputting the first non-robust representation to the classification network to obtain a second classification category output by the classification network;
    根据第一损失函数,对所述第一特征提取网络和所述第二特征提取网络进行迭代训练,直至满足收敛条件,输出训练后的第一特征提取网络和训练后的第二特征提取网络;According to the first loss function, perform iterative training on the first feature extraction network and the second feature extraction network until convergence conditions are met, and output the trained first feature extraction network and the trained second feature extraction network;
    其中,所述第一损失函数用于表示所述第一分类类别和第一标注类别之间的相似度,且用于表示所述第二分类类别和第二标注类别之间的相似度,所述第一标注类别为与所述对抗图像对应的正确类别,所述第二标注类别为与所述对抗图像对应的错误类别。Wherein, the first loss function is used to represent the similarity between the first classification category and the first label category, and is used to represent the similarity between the second classification category and the second label category, so The first label category is a correct category corresponding to the confrontation image, and the second label category is an error category corresponding to the confrontation image.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    将所述原始图像分别输入所述第一特征提取网络和所述第二特征提取网络,得到所述第一特征提取网络生成的第二鲁棒表示和所述第二特征提取网络生成的第二非鲁棒表示;The original image is input into the first feature extraction network and the second feature extraction network, respectively, to obtain a second robust representation generated by the first feature extraction network and a second feature extraction network generated by the second feature extraction network. Non-robust representation;
    将所述第二鲁棒表示和所述第二非鲁棒表示组合,得到组合后的第一表示;Combining the second robust representation and the second non-robust representation to obtain a combined first representation;
    将组合后的第一表示输入分类网络,以通过分类网络根据所述组合后的第一表示执行分类操作,得到分类网络输出的第三分类类别;Inputting the combined first representation into the classification network to perform a classification operation according to the combined first representation through the classification network to obtain the third classification category output by the classification network;
    所述根据第一损失函数,对所述第一特征提取网络和所述第二特征提取网络进行迭代训练,直至满足收敛条件,包括:The iterative training of the first feature extraction network and the second feature extraction network according to the first loss function until a convergence condition is met includes:
    根据所述第一损失函数和第二损失函数,对所述第一特征提取网络和所述第二特征提取网络进行迭代训练,直至满足收敛条件,其中,所述第二损失函数用于表示所述第三分类类别与第三标注类别之间的相似度,所述第三标注类别为与所述原始图像对应的正确类别。According to the first loss function and the second loss function, the first feature extraction network and the second feature extraction network are iteratively trained until the convergence condition is met, wherein the second loss function is used to represent the The similarity between the third classification category and the third annotation category, where the third annotation category is a correct category corresponding to the original image.
  3. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    将所述原始图像输入所述第一特征提取网络,得到所述第一特征提取网络生成的第二鲁棒表示;Input the original image into the first feature extraction network to obtain a second robust representation generated by the first feature extraction network;
    将所述第二鲁棒表示输入分类网络,以通过分类网络根据所述第二鲁棒表示执行分类操作,得到分类网络输出的第四分类类别;Inputting the second robust representation into a classification network to perform a classification operation according to the second robust representation through the classification network to obtain a fourth classification category output by the classification network;
    所述根据第一损失函数,对所述第一特征提取网络和所述第二特征提取网络进行迭代训练,直至满足收敛条件,包括:The iterative training of the first feature extraction network and the second feature extraction network according to the first loss function until a convergence condition is met includes:
    根据所述第一损失函数和第三损失函数,对所述第一特征提取网络和所述第二特征提取网络进行迭代训练,直至满足收敛条件,其中,所述第三损失函数用于表示所述第四分类类别与第三标注类别之间的相似度,所述第三标注类别为与所述原始图像对应的正确类别。According to the first loss function and the third loss function, the first feature extraction network and the second feature extraction network are iteratively trained until the convergence condition is satisfied, wherein the third loss function is used to represent the The similarity between the fourth classification category and the third annotation category, where the third annotation category is a correct category corresponding to the original image.
  4. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    将所述原始图像输入所述第二特征提取网络,得到所述第二特征提取网络生成的第二非鲁棒表示;Inputting the original image into the second feature extraction network to obtain a second non-robust representation generated by the second feature extraction network;
    将所述第二非鲁棒表示输入分类网络,以通过分类网络根据所述第二非鲁棒表示执行分类操作,得到分类网络输出的第五分类类别;Inputting the second non-robust representation into a classification network to perform a classification operation according to the second non-robust representation through the classification network to obtain a fifth classification category output by the classification network;
    所述根据第一损失函数,对所述第一特征提取网络和所述第二特征提取网络进行迭代训练,直至满足收敛条件,包括:The iterative training of the first feature extraction network and the second feature extraction network according to the first loss function until a convergence condition is met includes:
    根据所述第一损失函数和第四损失函数,对所述第一特征提取网络和所述第二特征提取网络进行迭代训练,直至满足收敛条件,其中,所述第四损失函数用于表示所述第五分类类别与第三标注类别之间的相似度,所述第三标注类别为与所述原始图像对应的正确类别。According to the first loss function and the fourth loss function, the first feature extraction network and the second feature extraction network are iteratively trained until the convergence condition is met, wherein the fourth loss function is used to represent the The similarity between the fifth classification category and the third annotation category, where the third annotation category is a correct category corresponding to the original image.
  5. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    将所述原始图像分别输入所述第一特征提取网络和所述第二特征提取网络,得到所述第一特征提取网络生成的第二鲁棒表示和所述第二特征提取网络生成的第二非鲁棒表示;The original image is input into the first feature extraction network and the second feature extraction network, respectively, to obtain a second robust representation generated by the first feature extraction network and a second feature extraction network generated by the second feature extraction network. Non-robust representation;
    将所述第二鲁棒表示和所述第二鲁棒表示组合,得到组合后的第一表示,将组合后的第一表示输入分类网络,以通过分类网络根据所述组合后的第一表示执行分类操作,得到分类网络输出的第三分类类别;The second robust representation and the second robust representation are combined to obtain a combined first representation, and the combined first representation is input to the classification network to use the classification network according to the combined first representation Perform the classification operation to obtain the third classification category output by the classification network;
    将所述第二鲁棒表示输入分类网络,以通过分类网络根据所述第二鲁棒表示执行分类操作,得到分类网络输出的第四分类类别;Inputting the second robust representation into a classification network to perform a classification operation according to the second robust representation through the classification network to obtain a fourth classification category output by the classification network;
    将所述第二非鲁棒表示输入分类网络,以通过分类网络根据所述第二非鲁棒表示执行分类操作,得到分类网络输出的第五分类类别;Inputting the second non-robust representation into a classification network to perform a classification operation according to the second non-robust representation through the classification network to obtain a fifth classification category output by the classification network;
    所述根据第一损失函数,对所述第一特征提取网络和所述第二特征提取网络进行迭代训练,直至满足收敛条件,包括:The iterative training of the first feature extraction network and the second feature extraction network according to the first loss function until a convergence condition is met includes:
    根据所述第一损失函数和第五损失函数,对所述第一特征提取网络和所述第二特征提取网络进行迭代训练,直至满足收敛条件,其中,所述第五损失函数用于表示所述第四分类类别与第三标注类别之间的相似度,且用于表示所述第五分类类别与所述第三标注类别之间的相似度,且用于表示所述第六分类类别与所述第三标注类别之间的相似度,所述第三标注类别为与所述原始图像对应的正确类别。According to the first loss function and the fifth loss function, the first feature extraction network and the second feature extraction network are iteratively trained until convergence conditions are met, wherein the fifth loss function is used to represent the The similarity between the fourth classification category and the third annotation category is used to indicate the similarity between the fifth classification category and the third annotation category, and is used to indicate the similarity between the sixth classification category and the The degree of similarity between the third annotation categories, where the third annotation category is a correct category corresponding to the original image.
  6. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, wherein the method further comprises:
    根据所述第二损失函数的函数值,生成第一梯度;Generating a first gradient according to the function value of the second loss function;
    根据所述第一梯度对所述原始图像进行扰动处理,以生成所述对抗图像,将所述第三标注类别确定为所述第一标注类别。Perform perturbation processing on the original image according to the first gradient to generate the confrontation image, and determine the third label category as the first label category.
  7. 根据权利要求3所述的方法,其特征在于,所述方法还包括:The method according to claim 3, wherein the method further comprises:
    根据所述第三损失函数的函数值,生成第二梯度;Generating a second gradient according to the function value of the third loss function;
    根据所述第二梯度对所述原始图像进行扰动处理,以生成所述对抗图像,将所述第三标注类别确定为所述第一标注类别。Performing perturbation processing on the original image according to the second gradient to generate the confrontation image, and determining the third label category as the first label category.
  8. 根据权利要求4所述的方法,其特征在于,所述方法还包括:The method according to claim 4, wherein the method further comprises:
    根据所述第四损失函数的函数值,生成第三梯度;Generating a third gradient according to the function value of the fourth loss function;
    根据所述第三梯度对所述原始图像进行扰动处理,以生成所述对抗图像,将所述第三标注类别确定为所述第一标注类别。Performing perturbation processing on the original image according to the third gradient to generate the confrontation image, and determining the third label category as the first label category.
  9. 根据权利要求1至8任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 8, wherein the method further comprises:
    将所述第一鲁棒表示和所述第一非鲁棒表示组合,得到组合后的第二表示;Combining the first robust representation and the first non-robust representation to obtain a combined second representation;
    将所述组合后的第二表示输入分类网络,得到分类网络输出的第六分类类别;Input the combined second representation into the classification network to obtain the sixth classification category output by the classification network;
    所述将所述第一鲁棒表示输入分类网络,得到分类网络输出的第一分类类别,将所述第一非鲁棒表示输入分类网络,得到分类网络输出的第二分类类别,包括:The inputting the first robust representation into the classification network to obtain the first classification category output by the classification network, and inputting the first non-robust representation into the classification network to obtain the second classification category output by the classification network includes:
    在所述第六分类类别与所述第一标注类别不同的情况下,将所述第一鲁棒表示输入分类网络,得到分类网络输出的第一分类类别,将所述第一非鲁棒表示输入分类网络,得到分类网络输出的第二分类类别。In the case that the sixth classification category is different from the first annotation category, the first robust representation is input to the classification network to obtain the first classification category output by the classification network, and the first non-robust representation is Input the classification network to obtain the second classification category output by the classification network.
  10. 根据权利要求9所述的方法,其特征在于,所述方法还包括:The method according to claim 9, wherein the method further comprises:
    在所述第六分类类别与所述第一标注类别不同的情况下,将所述第六分类类别确定为所述第二标注类别。In a case where the sixth classification category is different from the first annotation category, the sixth classification category is determined as the second annotation category.
  11. 根据权利要求1至8任一项所述的方法,其特征在于,所述第一特征提取网络为卷积神经网络或者残差神经网络,所述第二特征提取网络为卷积神经网络或残差神经网络。The method according to any one of claims 1 to 8, wherein the first feature extraction network is a convolutional neural network or a residual neural network, and the second feature extraction network is a convolutional neural network or a residual neural network. Poor neural network.
  12. 一种图像处理网络,其特征在于,所述图像处理网络包括第一特征提取网络、第二特征提取网络和特征处理网络;An image processing network, characterized in that the image processing network includes a first feature extraction network, a second feature extraction network, and a feature processing network;
    所述第一特征提取网络,用于接收输入的第一图像,生成与所述第一图像对应的鲁棒表示,所述鲁棒表示指的是对扰动不敏感的特征;The first feature extraction network is configured to receive an input first image, and generate a robust representation corresponding to the first image, where the robust representation refers to features that are not sensitive to disturbances;
    所述第二特征提取网络,用于接收输入的所述第一图像,生成与所述第一图像对应的非鲁棒表示,所述非鲁棒表示指的是对扰动敏感的特征;The second feature extraction network is configured to receive the input first image, and generate a non-robust representation corresponding to the first image, where the non-robust representation refers to a feature sensitive to disturbance;
    所述特征处理网络,用于获取所述鲁棒表示和所述非鲁棒表示,以输出与所述第一图像对应的第一处理结果。The feature processing network is configured to obtain the robust representation and the non-robust representation to output a first processing result corresponding to the first image.
  13. 根据权利要求12所述的网络,其特征在于,所述特征处理网络,具体用于:The network according to claim 12, wherein the characteristic processing network is specifically used for:
    将所述鲁棒表示和所述非鲁棒表示组合,根据组合后的表示,输出与所述第一图像对应的第一处理结果;或者Combine the robust representation and the non-robust representation, and output a first processing result corresponding to the first image according to the combined representation; or
    根据所述鲁棒表示,输出与所述第一图像对应的第一处理结果;或者According to the robust representation, output a first processing result corresponding to the first image; or
    根据所述非鲁棒表示,输出与所述第一图像对应的第一处理结果。According to the non-robust representation, a first processing result corresponding to the first image is output.
  14. 根据权利要求12或13所述的网络,其特征在于,所述特征处理网络,具体用于:The network according to claim 12 or 13, wherein the characteristic processing network is specifically used for:
    根据组合后的表示执行分类操作,输出与所述第一图像对应的分类类别;或者,Perform a classification operation according to the combined representation, and output a classification category corresponding to the first image; or,
    根据所述鲁棒表示执行分类操作,输出与所述第一图像对应的分类类别;或者,Perform a classification operation according to the robust representation, and output a classification category corresponding to the first image; or,
    根据所述非鲁棒表示执行分类操作,输出与所述第一图像对应的分类类别。Perform a classification operation according to the non-robust representation, and output a classification category corresponding to the first image.
  15. 根据权利要求12所述的网络,其特征在于,所述第一处理结果指示所述第一图像为原始图像,或者,所述第一处理结果指示所述第一图像为扰动后的图像。The network according to claim 12, wherein the first processing result indicates that the first image is an original image, or the first processing result indicates that the first image is a disturbed image.
  16. 根据权利要求15所述的网络,其特征在于,所述特征处理网络,具体用于:The network according to claim 15, wherein the characteristic processing network is specifically used for:
    根据所述鲁棒表示,确定与所述第一图像对应的第一分类类别;Determine, according to the robust representation, a first classification category corresponding to the first image;
    根据所述非鲁棒表示,确定与所述第一图像对应的第二分类类别;Determining a second classification category corresponding to the first image according to the non-robust representation;
    在所述第一分类类别与所述第二分类类别一致的情况下,输出的第一处理结果指示所述第一图像为原始图像;In the case that the first classification category is consistent with the second classification category, the output first processing result indicates that the first image is an original image;
    在所述第一分类类别与所述第二分类类别不一致的情况下,输出的第一处理结果指示所述第一图像为扰动后的图像。In a case where the first classification category is inconsistent with the second classification category, the output first processing result indicates that the first image is a disturbed image.
  17. 根据权利要求12所述的网络,其特征在于,所述特征处理网络,具体用于:The network according to claim 12, wherein the characteristic processing network is specifically used for:
    将所述鲁棒表示和所述非鲁棒表示组合,并根据组合后的表示执行检测操作,以输出与所述第一图像对应的检测结果,所述第一处理结果包括所述检测结果。The robust representation and the non-robust representation are combined, and a detection operation is performed according to the combined representation to output a detection result corresponding to the first image, and the first processing result includes the detection result.
  18. 根据权利要求12或13所述的网络,其特征在于,The network according to claim 12 or 13, characterized in that:
    所述图像处理网络为以下中的一项或多项:图像分类网络、图像识别网络、图像分割网络或图像检测网络。The image processing network is one or more of the following: an image classification network, an image recognition network, an image segmentation network, or an image detection network.
  19. 根据权利要求12至13任一项所述的网络,其特征在于,所述特征处理网络包括感知机。The network according to any one of claims 12 to 13, wherein the characteristic processing network comprises a perceptron.
  20. 根据权利要求12至13任一项所述的网络,其特征在于,所述第一特征提取网络为卷积神经网络或残差神经网络,所述第二特征提取网络为卷积神经网络或残差神经网络。The network according to any one of claims 12 to 13, wherein the first feature extraction network is a convolutional neural network or a residual neural network, and the second feature extraction network is a convolutional neural network or a residual neural network. Poor neural network.
  21. 一种神经网络的训练装置,其特征在于,所述装置包括:A neural network training device, characterized in that the device includes:
    输入模块,用于将对抗图像分别输入第一特征提取网络和第二特征提取网络,得到所述第一特征提取网络生成的第一鲁棒表示和所述第二特征提取网络生成的第一非鲁棒表示,其中,所述对抗图像为对原始图像进行过扰动处理的图像,鲁棒表示指的是对扰动不敏感的特征,非鲁棒表示指的是对扰动敏感的特征;The input module is used to input the confrontation image into the first feature extraction network and the second feature extraction network to obtain the first robust representation generated by the first feature extraction network and the first non-representation generated by the second feature extraction network. Robust representation, where the counter image is an image that has been subjected to disturbance processing on the original image, robust representation refers to features that are not sensitive to disturbance, and non-robust representation refers to features that are sensitive to disturbance;
    所述输入模块,还用于将所述第一鲁棒表示输入分类网络,得到分类网络输出的第一分类类别,将所述第一非鲁棒表示输入分类网络,得到分类网络输出的第二分类类别;The input module is further configured to input the first robust representation into the classification network to obtain the first classification category output by the classification network, and input the first non-robust representation into the classification network to obtain the second classification output of the classification network. Classification category
    训练模块,用于根据第一损失函数,对所述第一特征提取网络和所述第二特征提取网络进行迭代训练,直至满足收敛条件,输出训练后的第一特征提取网络和训练后的第二特征提取网络;The training module is configured to perform iterative training on the first feature extraction network and the second feature extraction network according to the first loss function until the convergence condition is met, and output the trained first feature extraction network and the trained first feature extraction network 2. Feature extraction network;
    其中,所述第一损失函数用于表示所述第一类别和第一标注类别之间的相似度,且用于表示所述第二分类类别和第二标注类别之间的相似度,所述第一标注类别为与所述对抗图像对应的正确类别,所述第二标注类别为与所述对抗图像对应的错误类别。Wherein, the first loss function is used to represent the similarity between the first category and the first label category, and is used to represent the similarity between the second classification category and the second label category, and the The first label category is a correct category corresponding to the confrontation image, and the second label category is an error category corresponding to the confrontation image.
  22. 根据权利要求21所述的装置,其特征在于,The device of claim 21, wherein:
    所述输入模块,还用于将所述原始图像分别输入所述第一特征提取网络和所述第二特征提取网络,得到所述第一特征提取网络生成的第二鲁棒表示和所述第二特征提取网络生成的第二非鲁棒表示;The input module is further configured to input the original image into the first feature extraction network and the second feature extraction network, respectively, to obtain the second robust representation generated by the first feature extraction network and the first feature extraction network. Second, the second non-robust representation generated by the feature extraction network;
    所述装置还包括:组合模块,用于将所述第二鲁棒表示和所述第二非鲁棒表示组合,得到组合后的第一表示;The device further includes: a combination module, configured to combine the second robust representation and the second non-robust representation to obtain a combined first representation;
    所述输入模块,还用于将组合后的第一表示输入分类网络,以通过分类网络根据所述组合后的第一表示执行分类操作,得到分类网络输出的第三分类类别;The input module is further configured to input the combined first representation into a classification network, so as to perform a classification operation according to the combined first representation through the classification network to obtain a third classification category output by the classification network;
    所述训练模块,具体用于根据所述第一损失函数和第二损失函数,对所述第一特征提取网络和所述第二特征提取网络进行迭代训练,直至满足收敛条件,其中,所述第二损失 函数用于表示所述第三分类类别与第三标注类别之间的相似度,所述第三标注类别为与所述原始图像对应的正确类别。The training module is specifically configured to perform iterative training on the first feature extraction network and the second feature extraction network according to the first loss function and the second loss function until a convergence condition is met, wherein the The second loss function is used to represent the similarity between the third classification category and the third annotation category, and the third annotation category is a correct category corresponding to the original image.
  23. 根据权利要求21或22所述的装置,其特征在于,The device according to claim 21 or 22, wherein:
    所述输入模块,还用于将所述原始图像输入所述第一特征提取网络,得到所述第一特征提取网络生成的第二鲁棒表示;The input module is further configured to input the original image into the first feature extraction network to obtain a second robust representation generated by the first feature extraction network;
    所述输入模块,还用于将所述第二鲁棒表示输入分类网络,以通过分类网络根据所述第二鲁棒表示执行分类操作,得到分类网络输出的第四分类类别;The input module is further configured to input the second robust representation into a classification network, so as to perform a classification operation according to the second robust representation through the classification network to obtain a fourth classification category output by the classification network;
    所述训练模块,具体用于根据所述第一损失函数和第三损失函数,对所述第一特征提取网络和所述第二特征提取网络进行迭代训练,直至满足收敛条件,其中,所述第三损失函数用于表示所述第四分类类别与第三标注类别之间的相似度,所述第三标注类别为与所述原始图像对应的正确类别。The training module is specifically configured to perform iterative training on the first feature extraction network and the second feature extraction network according to the first loss function and the third loss function until a convergence condition is met, wherein the The third loss function is used to represent the similarity between the fourth classification category and the third annotation category, and the third annotation category is a correct category corresponding to the original image.
  24. 根据权利要求21或22所述的装置,其特征在于,The device according to claim 21 or 22, wherein:
    所述输入模块,还用于将所述原始图像输入所述第二特征提取网络,得到所述第二特征提取网络生成的第二非鲁棒表示;The input module is further configured to input the original image into the second feature extraction network to obtain a second non-robust representation generated by the second feature extraction network;
    所述输入模块,还用于将所述第二非鲁棒表示输入分类网络,以通过分类网络根据所述第二非鲁棒表示执行分类操作,得到分类网络输出的第五分类类别;The input module is further configured to input the second non-robust representation into a classification network to perform a classification operation according to the second non-robust representation through the classification network to obtain a fifth classification category output by the classification network;
    所述训练模块,具体用于根据所述第一损失函数和第四损失函数,对所述第一特征提取网络和所述第二特征提取网络进行迭代训练,直至满足收敛条件,其中,所述第四损失函数用于表示所述第五分类类别与第三标注类别之间的相似度,所述第三标注类别为与所述原始图像对应的正确类别。The training module is specifically configured to perform iterative training on the first feature extraction network and the second feature extraction network according to the first loss function and the fourth loss function until a convergence condition is met, wherein the The fourth loss function is used to indicate the similarity between the fifth classification category and the third label category, and the third label category is a correct category corresponding to the original image.
  25. 根据权利要求21所述的装置,其特征在于,The device of claim 21, wherein:
    所述输入模块,还用于将所述原始图像分别输入所述第一特征提取网络和所述第二特征提取网络,得到所述第一特征提取网络生成的第二鲁棒表示和所述第二特征提取网络生成的第二非鲁棒表示;The input module is further configured to input the original image into the first feature extraction network and the second feature extraction network, respectively, to obtain the second robust representation generated by the first feature extraction network and the first feature extraction network. Second, the second non-robust representation generated by the feature extraction network;
    所述输入模块,还用于将所述第二鲁棒表示和所述第二鲁棒表示组合,得到组合后的第一表示,将组合后的第一表示输入分类网络,以通过分类网络根据所述组合后的第一表示执行分类操作,得到分类网络输出的第三分类类别;The input module is further configured to combine the second robust representation and the second robust representation to obtain a combined first representation, and input the combined first representation into a classification network to pass the classification network according to The combined first means to perform a classification operation to obtain the third classification category output by the classification network;
    所述输入模块,还用于将所述第二鲁棒表示输入分类网络,以通过分类网络根据所述第二鲁棒表示执行分类操作,得到分类网络输出的第四分类类别;The input module is further configured to input the second robust representation into a classification network to perform a classification operation according to the second robust representation through the classification network to obtain a fourth classification category output by the classification network;
    所述输入模块,还用于将所述第二非鲁棒表示输入分类网络,以通过分类网络根据所述第二非鲁棒表示执行分类操作,得到分类网络输出的第五分类类别;The input module is further configured to input the second non-robust representation into a classification network to perform a classification operation according to the second non-robust representation through the classification network to obtain a fifth classification category output by the classification network;
    所述训练模块,具体用于根据所述第一损失函数和第五损失函数,对所述第一特征提取网络和所述第二特征提取网络进行迭代训练,直至满足收敛条件,其中,所述第五损失函数用于表示所述第四分类类别与第三标注类别之间的相似度,且用于表示所述第五分类类别与所述第三标注类别之间的相似度,且用于表示所述第六分类类别与所述第三标注类别之间的相似度,所述第三标注类别为与所述原始图像对应的正确类别。The training module is specifically configured to perform iterative training on the first feature extraction network and the second feature extraction network according to the first loss function and the fifth loss function until a convergence condition is met, wherein the The fifth loss function is used to represent the similarity between the fourth classification category and the third label category, and is used to represent the similarity between the fifth classification category and the third label category, and is used for Represents the similarity between the sixth classification category and the third annotation category, and the third annotation category is a correct category corresponding to the original image.
  26. 根据权利要求22所述的装置,其特征在于,The device of claim 22, wherein:
    所述装置还包括生成模块,具体用于:The device also includes a generating module, which is specifically configured to:
    根据所述第二损失函数的函数值,生成第一梯度;Generating a first gradient according to the function value of the second loss function;
    根据所述第一梯度对所述原始图像进行扰动处理,以生成所述对抗图像,将所述第三标注类别确定为所述第一标注类别。Perform perturbation processing on the original image according to the first gradient to generate the confrontation image, and determine the third label category as the first label category.
  27. 根据权利要求23所述的装置,其特征在于,The device of claim 23, wherein:
    所述装置还包括生成模块,具体用于:The device also includes a generating module, which is specifically configured to:
    根据所述第三损失函数的函数值,生成第二梯度;Generating a second gradient according to the function value of the third loss function;
    根据所述第二梯度对所述原始图像进行扰动处理,以生成所述对抗图像,将所述第三标注类别确定为所述第一标注类别。Performing perturbation processing on the original image according to the second gradient to generate the confrontation image, and determining the third label category as the first label category.
  28. 根据权利要求24所述的装置,其特征在于,The device of claim 24, wherein:
    所述装置还包括生成模块,具体用于:The device also includes a generating module, which is specifically configured to:
    根据所述第四损失函数的函数值,生成第三梯度;Generating a third gradient according to the function value of the fourth loss function;
    根据所述第三梯度对所述原始图像进行扰动处理,以生成所述对抗图像,将所述第三标注类别确定为所述第一标注类别。Performing perturbation processing on the original image according to the third gradient to generate the confrontation image, and determining the third label category as the first label category.
  29. 根据权利要求21至28任一项所述的装置,其特征在于,The device according to any one of claims 21 to 28, characterized in that:
    所述装置还包括:组合模块,用于将所述第一鲁棒表示和所述第一非鲁棒表示组合,得到组合后的第二表示;The device further includes: a combination module, configured to combine the first robust representation and the first non-robust representation to obtain a combined second representation;
    所述输入模块,还用于将所述组合后的第二表示输入分类网络,得到分类网络输出的第六分类类别;The input module is further configured to input the combined second representation into the classification network to obtain the sixth classification category output by the classification network;
    所述输入模块,具体用于在所述第六分类类别与所述第一标注类别不同的情况下,将所述第一鲁棒表示输入分类网络,得到分类网络输出的第一分类类别,将所述第一非鲁棒表示输入分类网络,得到分类网络输出的第二分类类别。The input module is specifically configured to input the first robust representation into the classification network to obtain the first classification category output by the classification network when the sixth classification category is different from the first annotation category, and The first non-robust means that the input classification network is input, and the second classification category output by the classification network is obtained.
  30. 根据权利要求29所述的装置,其特征在于,所述装置还包括:确定模块,用于在所述第六分类类别与所述第一标注类别不同的情况下,将所述第六分类类别确定为所述第二标注类别。The device according to claim 29, wherein the device further comprises: a determining module, configured to classify the sixth classification category when the sixth classification category is different from the first label category Determined as the second label category.
  31. 根据权利要求21至28任一项所述的装置,其特征在于,所述第一特征提取网络为卷积神经网络或者残差神经网络,所述第二特征提取网络为卷积神经网络或残差神经网络。The device according to any one of claims 21 to 28, wherein the first feature extraction network is a convolutional neural network or a residual neural network, and the second feature extraction network is a convolutional neural network or a residual neural network. Poor neural network.
  32. 一种图像处理方法,其特征在于,所述方法包括:An image processing method, characterized in that the method includes:
    将第一图像输入第一特征提取网络,得到所述第一特征提取网络生成的与所述第一图像对应的鲁棒表示,所述鲁棒表示指的是对扰动不敏感的特征;Inputting the first image into a first feature extraction network to obtain a robust representation corresponding to the first image generated by the first feature extraction network, where the robust representation refers to features that are not sensitive to disturbances;
    将所述第一图像输入第二特征提取网络,得到所述第二特征提取网络生成的与所述第一图像对应的非鲁棒表示,所述非鲁棒表示指的是对扰动敏感的特征;The first image is input into a second feature extraction network to obtain a non-robust representation corresponding to the first image generated by the second feature extraction network, and the non-robust representation refers to a feature that is sensitive to disturbances ;
    通过特征处理网络,根据所述鲁棒表示和所述非鲁棒表示,输出与所述第一图像对应的第一处理结果。Through the feature processing network, according to the robust representation and the non-robust representation, a first processing result corresponding to the first image is output.
  33. 根据权利要求32所述的方法,其特征在于,所述通过特征处理网络,根据所述鲁棒表示和所述非鲁棒表示,输出与所述第一图像对应的第一处理结果,包括:The method according to claim 32, wherein the outputting a first processing result corresponding to the first image according to the robust representation and the non-robust representation through the feature processing network comprises:
    将所述鲁棒表示和所述非鲁棒表示组合,通过所述特征处理网络,根据组合后的表示,输出与所述第一图像对应的第一处理结果;或者,Combine the robust representation and the non-robust representation, and output a first processing result corresponding to the first image according to the combined representation through the feature processing network; or,
    通过所述特征处理网络,根据所述鲁棒表示,输出与所述第一图像对应的第一处理结果,所述第一情况和所述第二情况为不同的情况;或者,Through the feature processing network, output a first processing result corresponding to the first image according to the robust representation, and the first situation and the second situation are different situations; or,
    通过所述特征处理网络,根据所述非鲁棒表示,输出与所述第一图像对应的第一处理结果。Through the feature processing network, according to the non-robust representation, a first processing result corresponding to the first image is output.
  34. 根据权利要求32或33所述的方法,其特征在于,The method according to claim 32 or 33, wherein:
    通过所述特征处理网络,根据组合后的表示执行分类操作,输出与所述第一图像对应的分类类别;或者,Through the feature processing network, perform a classification operation according to the combined representation, and output a classification category corresponding to the first image; or,
    通过所述特征处理网络,根据所述鲁棒表示执行分类操作,输出与所述第一图像对应的分类类别;或者,Through the feature processing network, perform a classification operation according to the robust representation, and output a classification category corresponding to the first image; or,
    通过所述特征处理网络,根据所述非鲁棒表示执行分类操作,输出与所述第一图像对应的分类类别。Through the feature processing network, a classification operation is performed according to the non-robust representation, and a classification category corresponding to the first image is output.
  35. 根据权利要求32所述的方法,其特征在于,所述第一处理结果指示所述第一图像为原始图像,或者,所述第一处理结果指示所述第一图像为扰动后的图像。The method according to claim 32, wherein the first processing result indicates that the first image is an original image, or the first processing result indicates that the first image is a disturbed image.
  36. 根据权利要求35所述的方法,其特征在于,所述通过特征处理网络,根据所述鲁棒表示和所述非鲁棒表示,输出与所述第一图像对应的第一处理结果,包括:The method according to claim 35, wherein the outputting a first processing result corresponding to the first image according to the robust representation and the non-robust representation through the feature processing network comprises:
    通过所述特征处理网络,根据所述鲁棒表示,确定与所述第一图像对应的第一分类类别,并根据所述非鲁棒表示,确定与所述第一图像对应的第二分类类别;Through the feature processing network, the first classification category corresponding to the first image is determined according to the robust representation, and the second classification category corresponding to the first image is determined according to the non-robust representation ;
    在所述第一分类类别与所述第二分类类别一致的情况下,通过所述特征处理网络输出的第一处理结果指示所述第一图像为原始图像;In a case where the first classification category is consistent with the second classification category, the first processing result output by the feature processing network indicates that the first image is an original image;
    在所述第一分类类别与所述第二分类类别不一致的情况下,通过所述特征处理网络输出的第一处理结果指示所述第一图像为扰动后的图像。In a case where the first classification category is inconsistent with the second classification category, the first processing result output by the feature processing network indicates that the first image is a disturbed image.
  37. 根据权利要求32所述的方法,其特征在于,所述通过特征处理网络,根据所述鲁棒表示和所述非鲁棒表示,输出与所述第一图像对应的第一处理结果,包括:The method according to claim 32, wherein the outputting a first processing result corresponding to the first image according to the robust representation and the non-robust representation through the feature processing network comprises:
    通过所述特征处理网络,将所述鲁棒表示和所述非鲁棒表示组合,并根据组合后的表示执行检测操作,以输出与所述第一图像对应的检测结果,所述第一处理结果包括所述检测结果。Through the feature processing network, the robust representation and the non-robust representation are combined, and a detection operation is performed according to the combined representation to output a detection result corresponding to the first image. The first processing The result includes the test result.
  38. 根据权利要求32或33所述的方法,其特征在于,所述特征处理网络包括感知机。The method according to claim 32 or 33, wherein the feature processing network comprises a perceptron.
  39. 根据权利要求32或33所述的方法,其特征在于,所述第一特征提取网络为卷积神经网络或残差神经网络,所述第二特征提取网络为卷积神经网络或残差神经网络。The method according to claim 32 or 33, wherein the first feature extraction network is a convolutional neural network or a residual neural network, and the second feature extraction network is a convolutional neural network or a residual neural network .
  40. 一种训练设备,其特征在于,包括处理器,所述处理器和存储器耦合,所述存储器存储有程序指令,当所述存储器存储的程序指令被所述处理器执行时实现权利要求1至11中任一项所述的方法。A training device, comprising a processor, the processor is coupled with a memory, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, claims 1 to 11 are implemented The method of any one of.
  41. 一种执行设备,其特征在于,所述执行设备中配置有图像处理网络,所述图像处理网络为权利要求12至20中任一项所述的图像处理网络。An execution device, characterized in that an image processing network is configured in the execution device, and the image processing network is the image processing network according to any one of claims 12 to 20.
  42. 根据权利要求41所述的执行设备,其特征在于,所述执行设备为以下中的一项或 多项:手机、电脑、可穿戴设备、自动驾驶车辆、智能家电和芯片。The execution device according to claim 41, wherein the execution device is one or more of the following: mobile phones, computers, wearable devices, autonomous vehicles, smart home appliances, and chips.
  43. 一种计算机可读存储介质,其特征在于,包括程序,当其在计算机上运行时,使得计算机执行如权利要求1至11中任一项所述的方法,或者,使得计算机执行如权利要求32至39中任一项所述的方法。A computer-readable storage medium, characterized by comprising a program, which when running on a computer, causes the computer to execute the method according to any one of claims 1 to 11, or causes the computer to execute the method according to claim 32 To the method of any one of 39.
  44. 一种电路系统,其特征在于,所述电路系统包括处理电路,所述处理电路配置为执行如权利要求1至11中任一项所述的方法,或者,所述处理电路配置为执行如权利要求32至39中任一项所述的方法。A circuit system, characterized in that the circuit system comprises a processing circuit configured to execute the method according to any one of claims 1 to 11, or the processing circuit is configured to execute The method of any one of 32 to 39 is required.
PCT/CN2021/081238 2020-04-30 2021-03-17 Neural network for image processing and related device WO2021218471A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010362629.6 2020-04-30
CN202010362629.6A CN111695596A (en) 2020-04-30 2020-04-30 Neural network for image processing and related equipment

Publications (1)

Publication Number Publication Date
WO2021218471A1 true WO2021218471A1 (en) 2021-11-04

Family

ID=72476927

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/081238 WO2021218471A1 (en) 2020-04-30 2021-03-17 Neural network for image processing and related device

Country Status (2)

Country Link
CN (1) CN111695596A (en)
WO (1) WO2021218471A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116723058A (en) * 2023-08-10 2023-09-08 井芯微电子技术(天津)有限公司 Network attack detection and protection method and device

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695596A (en) * 2020-04-30 2020-09-22 华为技术有限公司 Neural network for image processing and related equipment
US20220101116A1 (en) * 2020-09-28 2022-03-31 Robert Bosch Gmbh Method and system for probably robust classification with detection of adversarial examples
CN112465017A (en) * 2020-11-26 2021-03-09 平安科技(深圳)有限公司 Classification model training method and device, terminal and storage medium
JP7158515B2 (en) * 2021-02-18 2022-10-21 本田技研工業株式会社 LEARNING DEVICE, LEARNING METHOD AND PROGRAM
CN113569822B (en) * 2021-09-24 2021-12-21 腾讯科技(深圳)有限公司 Image segmentation method and device, computer equipment and storage medium
TWI810993B (en) * 2022-01-06 2023-08-01 鴻海精密工業股份有限公司 Model generating apparatus and method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784148A (en) * 2018-12-06 2019-05-21 北京飞搜科技有限公司 Biopsy method and device
CN110569916A (en) * 2019-09-16 2019-12-13 电子科技大学 Confrontation sample defense system and method for artificial intelligence classification
CN110717522A (en) * 2019-09-18 2020-01-21 平安科技(深圳)有限公司 Countermeasure defense method of image classification network and related device
US20200074234A1 (en) * 2018-09-05 2020-03-05 Vanderbilt University Noise-robust neural networks and methods thereof
CN111695596A (en) * 2020-04-30 2020-09-22 华为技术有限公司 Neural network for image processing and related equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11113599B2 (en) * 2017-06-22 2021-09-07 Adobe Inc. Image captioning utilizing semantic text modeling and adversarial learning
US11030485B2 (en) * 2018-03-30 2021-06-08 Arizona Board Of Regents On Behalf Of Arizona State University Systems and methods for feature transformation, correction and regeneration for robust sensing, transmission, computer vision, recognition and classification
CN110097059B (en) * 2019-03-22 2021-04-02 中国科学院自动化研究所 Document image binarization method, system and device based on generation countermeasure network
CN110516745B (en) * 2019-08-28 2022-05-24 北京达佳互联信息技术有限公司 Training method and device of image recognition model and electronic equipment
CN110852363B (en) * 2019-10-31 2022-08-02 大连理工大学 Anti-sample defense method based on deception attacker

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200074234A1 (en) * 2018-09-05 2020-03-05 Vanderbilt University Noise-robust neural networks and methods thereof
CN109784148A (en) * 2018-12-06 2019-05-21 北京飞搜科技有限公司 Biopsy method and device
CN110569916A (en) * 2019-09-16 2019-12-13 电子科技大学 Confrontation sample defense system and method for artificial intelligence classification
CN110717522A (en) * 2019-09-18 2020-01-21 平安科技(深圳)有限公司 Countermeasure defense method of image classification network and related device
CN111695596A (en) * 2020-04-30 2020-09-22 华为技术有限公司 Neural network for image processing and related equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116723058A (en) * 2023-08-10 2023-09-08 井芯微电子技术(天津)有限公司 Network attack detection and protection method and device
CN116723058B (en) * 2023-08-10 2023-12-01 井芯微电子技术(天津)有限公司 Network attack detection and protection method and device

Also Published As

Publication number Publication date
CN111695596A (en) 2020-09-22

Similar Documents

Publication Publication Date Title
WO2021218471A1 (en) Neural network for image processing and related device
US20210012198A1 (en) Method for training deep neural network and apparatus
WO2022083536A1 (en) Neural network construction method and apparatus
WO2022017245A1 (en) Text recognition network, neural network training method, and related device
CN111401406B (en) Neural network training method, video frame processing method and related equipment
CN112183577A (en) Training method of semi-supervised learning model, image processing method and equipment
WO2022111617A1 (en) Model training method and apparatus
WO2021238333A1 (en) Text processing network, neural network training method, and related device
WO2022016556A1 (en) Neural network distillation method and apparatus
CN113095475A (en) Neural network training method, image processing method and related equipment
CN111414915B (en) Character recognition method and related equipment
WO2022012668A1 (en) Training set processing method and apparatus
WO2022111387A1 (en) Data processing method and related apparatus
CN113065997B (en) Image processing method, neural network training method and related equipment
CN113435520A (en) Neural network training method, device, equipment and computer readable storage medium
CN111566646A (en) Electronic device for obfuscating and decoding data and method for controlling the same
CN113162787B (en) Method for fault location in a telecommunication network, node classification method and related devices
WO2023179482A1 (en) Image processing method, neural network training method and related device
WO2023185925A1 (en) Data processing method and related apparatus
CN113869496A (en) Acquisition method of neural network, data processing method and related equipment
CN115081616A (en) Data denoising method and related equipment
WO2022100607A1 (en) Method for determining neural network structure and apparatus thereof
CN115238909A (en) Data value evaluation method based on federal learning and related equipment thereof
CN113449548A (en) Method and apparatus for updating object recognition model
WO2023231753A1 (en) Neural network training method, data processing method, and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21797590

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21797590

Country of ref document: EP

Kind code of ref document: A1