CN111814776B

CN111814776B - Image processing method, device, server and storage medium

Info

Publication number: CN111814776B
Application number: CN202010949537.8A
Authority: CN
Inventors: 刘彦宏; 王洪斌
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2020-09-10
Filing date: 2020-09-10
Publication date: 2020-12-15
Anticipated expiration: 2040-09-10
Also published as: CN111814776A

Abstract

The embodiment of the invention discloses an image processing method, image processing equipment, a server and a storage medium, wherein the method comprises the following steps: acquiring a target image to be processed; inputting the target image into a target detection model for detection so as to identify a target detection frame and a target category corresponding to the target object from the target image; inputting a target image into a robust feature extraction model to extract a first feature mask set corresponding to each component of a target object in the target image; determining a second feature mask set corresponding to the target category according to the corresponding relation between the preset feature masks and the categories, and calculating a similarity coefficient between the first feature mask set and the second feature mask set; and correcting the target class of the target object in the target image according to the similarity coefficient. By the method for classifying and correcting the target object in the attacked target image based on the robust features, the difficulty of resisting attack cracking is increased, and the efficiency and the accuracy of image processing are effectively improved.

Description

Image processing method, device, server and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, a server, and a storage medium.

Background

The intelligent monitoring system can be applied to a plurality of applications using intelligent monitoring cameras in the construction of smart cities, and can be used for detecting and identifying different types of target objects in monitoring images obtained by shooting in scenes. For example, in applications such as community security, food supervision, environmental supervision, traffic monitoring, and the like, specific people and objects are detected, and in applications such as security and supervision, the robustness requirement on detection is high. At present, people realize detection and identification of target objects in monitored images through a deep convolutional neural network technology, the target detection technology obtains a model through training on a predefined image data set, and then predicts images acquired online in real time through the model in an actual scene.

However, when the deep neural network model processes an image subjected to counterattack, the accuracy rate is reduced sharply, the counterdefense method based on the norm model can provide robustness only when the disturbance value of the pixel is smaller than a certain threshold value, and an effective solution is not provided for the attack with the disturbance range larger than the threshold value. Therefore, how to improve robustness in the image processing process is very important.

Disclosure of Invention

The embodiment of the invention provides an image processing method, equipment, a server and a storage medium, wherein the difficulty of resisting attack cracking is increased by classifying and correcting target objects in an attacked target image based on robust features, an attacker not only needs to change the prediction type of a model but also needs to change each robust feature, meanwhile, the original deep neural network model can be continuously used for predicting the target image which is not attacked, the high accuracy of prediction is kept by using the non-robust features, and the efficiency and the accuracy of image processing are effectively improved.

In a first aspect, an embodiment of the present invention provides an image processing method, including:

acquiring a target image to be processed, wherein the target image comprises a target object;

inputting the target image into a target detection model for detection so as to identify a target detection frame and a target category corresponding to the target object from the target image;

inputting the target image into a robust feature extraction model to extract a first feature mask set corresponding to each component of the target object in the target image;

determining a second feature mask set corresponding to the target category according to a preset corresponding relation between feature masks and categories, and calculating a similarity coefficient between the first feature mask set and the second feature mask set;

and correcting the target class of the target object in the target image according to the similarity coefficient.

Further, before inputting the target image into a target detection model for detection, the method further includes:

acquiring a sample image set, and determining a target object in each sample image in the sample image set;

adding a first class label and a detection frame to the target object in each sample image;

and inputting the sample images added with the first class labels and the detection frames into a deep neural network model for training to obtain the target detection model.

Further, before inputting the target image into the robust feature extraction model, the method further includes:

determining components of the target object in each sample image;

adding a second class label and a feature mask to each component of the target object in each sample image;

and inputting the sample images added with the second class labels and the feature masks into the deep neural network model for training to obtain the robust feature extraction model.

Further, the inputting the target image into a robust feature extraction model to extract a first feature mask set corresponding to each component of the target object in the target image includes:

inputting the target image into a robust feature extraction model to determine pixel coverage areas of all components of the target object in the target image;

and extracting the first feature mask set corresponding to the pixel coverage area of each component of the target object.

Further, the performing, according to the similarity coefficient, a correction process on the target class of the target object in the target image includes:

detecting whether the similarity coefficient is larger than a preset threshold value or not;

if the detection result is that the similarity coefficient is larger than a preset threshold value, determining that the target object in the target image is not attacked by counterattack, and not correcting the target type of the target object;

and if the detection result is that the similarity coefficient is smaller than or equal to a preset threshold value, determining that the target object in the target image is attacked by counterattack, and correcting the target class of the target object in the target image.

Further, the performing rectification processing on the target class of the target object in the target image includes:

determining a feature mask corresponding to each category according to the corresponding relation between the preset feature masks and the categories;

calculating similarity coefficients of the first feature mask set and the feature masks corresponding to each category;

and determining the category corresponding to the maximum similarity coefficient as the target category of the target object.

Further, the calculating a similarity coefficient between the first set of feature masks and the second set of feature masks comprises:

acquiring an intersection feature mask of the first feature mask set and the second feature mask set;

acquiring a union feature mask of the first feature mask set and the second feature mask set;

and determining a similarity coefficient between the first feature mask set and the second feature mask set according to the absolute value of the ratio of the intersection feature mask to the union feature mask.

In a second aspect, an embodiment of the present invention provides an image processing apparatus, including:

In a third aspect, an embodiment of the present invention provides a server, including a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, where the memory is used to store a computer program that supports an image processing device to execute the above method, and the computer program includes a program, and the processor is configured to call the program to execute the method of the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium, which stores a computer program, where the computer program is executed by a processor to implement the method of the first aspect.

In the embodiment of the present invention, a server may obtain a target image to be processed, input the target image to be processed into a target detection model for detection, input the target image into a robust feature extraction model to extract a first feature mask set corresponding to each component of the target object in the target image, determine a second feature mask set corresponding to the target class according to a preset correspondence between feature masks and classes, and calculate a similarity coefficient between the first feature mask set and the second feature mask set, so as to correct the target class of the target object in the target image according to the similarity coefficient. By the method for carrying out classification correction on the target object in the attacked target image based on the robust features, difficulty in resisting attack cracking is increased, an attacker not only needs to change the prediction type of the model but also needs to change each robust feature, meanwhile, the original deep neural network model can be continuously used for predicting the target image which is not attacked, high prediction accuracy is kept by using the non-robust features, and efficiency and accuracy of image processing are effectively improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram of an image processing method provided by an embodiment of the invention;

fig. 2 is a schematic block diagram of an image processing apparatus provided by an embodiment of the present invention;

fig. 3 is a schematic block diagram of a server according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The image processing method provided by the embodiment of the invention can be applied to an image processing device, and the image processing device can be arranged in a server.

An image processing method provided by the embodiment of the invention is schematically described below with reference to fig. 1.

Referring to fig. 1, fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present invention, and as shown in fig. 1, the method may be executed by an image processing apparatus, where the image processing apparatus is disposed in a server. Specifically, the method of the embodiment of the present invention includes the following steps.

S101: and acquiring a target image to be processed, wherein the target image comprises a target object.

In the embodiment of the invention, the image processing device can acquire a target image to be processed, wherein the target image comprises a target object. In some embodiments, one or more target objects are included in the target image, and the target objects may be any objects such as people and things.

In some embodiments, the target image may be captured by a camera; in some embodiments, the camera may include, but is not limited to, a camera, a sensor, etc., which may be used to monitor a scene. In some embodiments, the image processing apparatus may establish a communication connection with a camera, and the image processing apparatus may acquire a target image captured by the camera.

S102: and inputting the target image into a target detection model for detection so as to identify a target detection frame and a target category corresponding to the target object from the target image.

In this embodiment of the present invention, the image processing device may input the target image into a target detection model for detection, so as to identify a target detection frame and a target category corresponding to the target object from the target image.

In an embodiment, before inputting the target image into a target detection model for detection, the image processing device may obtain a sample image set, determine a target object in each sample image in the sample image set, add a first class label and a detection frame to the target object in each sample image, and input each sample image added with the first class label and the detection frame into a deep neural network model for training to obtain the target detection model. In some embodiments, the first class label is used to indicate a class of each target object in each sample image. In some embodiments, the detection frame may be a closed frame composed of lines, wherein the closed frame composed of lines may have any shape, and in one example, the closed frame composed of lines may be a circular frame, a square frame, a polygonal frame, an irregular frame, or the like, which is not limited herein.

S103: and inputting the target image into a robust feature extraction model to extract a first feature mask set corresponding to each component of the target object in the target image.

In this embodiment of the present invention, the image processing device may input the target image into a robust feature extraction model to extract a first feature mask set corresponding to each component of the target object in the target image. In some embodiments, the first set of feature masks includes one or more feature masks.

In one embodiment, before inputting the target image into the robust feature extraction model, the image processing device may determine components of the target object in each sample image, add a second class label and a feature mask to each component of the target object in each sample image, and input each sample image added with the second class label and the feature mask into the deep neural network model for training, so as to obtain the robust feature extraction model. In some embodiments, the feature mask is composed of numbers to indicate robust features of components of the target object.

In one example, assuming that the target object in the sample image is an automobile, the components of the automobile include tires, windows, a frame, wipers, and the like.

In one embodiment, before inputting the target image into the robust feature extraction model, the image processing device may extract a portion of a subsample image including the target object from the sample image set, determine a component of the target object in each subsample image, add a second class label and a first feature mask to each component of the target object in each subsample image, and input each subsample image added with the second class label and the first feature mask into the deep neural network model for training, so as to obtain the robust feature extraction model.

In one embodiment, when the target image is input into a robust feature extraction model to extract a first feature mask set corresponding to each component of the target object in the target image, the image processing apparatus may input the target image into the robust feature extraction model to determine a pixel coverage area of each component of the target object in the target image and extract the first feature mask set corresponding to the pixel coverage area of each component of the target object.

In one example, assuming that the target object is an automobile, and the constituent parts of the automobile include windows, the windows may represent pixel coverage areas of the windows with first feature masks corresponding to different colors.

S104: and determining a second feature mask set corresponding to the target category according to the preset corresponding relation between the feature masks and the categories, and calculating a similarity coefficient between the first feature mask set and the second feature mask set.

In this embodiment of the present invention, the image processing device may determine, according to a preset correspondence between feature masks and categories, a second feature mask set corresponding to the target category, and calculate a similarity coefficient between the first feature mask set and the second feature mask set. In some embodiments, the second set of feature masks includes one or more feature masks.

In one embodiment, when calculating the similarity coefficient between the first feature mask set and the second feature mask set, the image processing apparatus may obtain an intersection feature mask of the first feature mask set and the second feature mask set, obtain a union feature mask of the first feature mask set and the second feature mask set, and determine the similarity coefficient between the first feature mask set and the second feature mask set according to an absolute value of a ratio of the intersection feature mask to the union feature mask.

In some embodiments, the correspondence between the preset feature mask and the category may be represented in the form of a matrix, and the matrix is established according to the preset feature mask and the category.

In one example, assuming that the first set of feature masks is fri and the second set of feature masks is fei, a similarity coefficient J (fri, fei) between the first set of feature masks fri and the second set of feature masks fei may be calculated according to the following equation (1).

S105: and correcting the target class of the target object in the target image according to the similarity coefficient.

In this embodiment of the present invention, the image processing device may perform a correction process on the target class of the target object in the target image according to the similarity coefficient.

In one embodiment, when the image processing apparatus performs the correction processing on the target class of the target object in the target image according to the similarity coefficient, it may detect whether the similarity coefficient is greater than a preset threshold, if the detection result is that the similarity coefficient is greater than the preset threshold, it may be determined that the target object in the target image is not attacked, the correction processing is not performed on the target type of the target object, and if the detection result is that the similarity coefficient is less than or equal to the preset threshold, it may be determined that the target object in the target image is attacked, and the correction processing is performed on the target class of the target object in the target image.

For example, if the image processing device detects that the similarity coefficient J is greater than the preset threshold t, it may be determined that the target object bi in the target image is not attacked by counterattack, the target type of the target object is not corrected, and the first tag of the currently identified target object is determined to be the final target category of the target object; if the similarity coefficient J is detected to be less than or equal to the preset threshold t, it may be determined that the target object in the target image is under counterattack and the target class of the target object in the target image needs to be corrected.

In an embodiment, when performing the rectification processing on the target category of the target object in the target image, the image processing apparatus may determine, according to the preset correspondence between the feature masks and the categories, a feature mask corresponding to each category, calculate a similarity coefficient between the first feature mask set and the feature mask corresponding to each category, and determine the category corresponding to the maximum similarity coefficient as the target category of the target object.

In an embodiment, when determining that the category corresponding to the maximum similarity coefficient is the target category of the target object, the image processing device may obtain a first category tag corresponding to the maximum similarity coefficient, and add the first category tag to the target object to determine that the category corresponding to the maximum similarity coefficient is the target category of the target object.

For example, it is assumed that a feature mask corresponding to each category is determined according to a preset correspondence between feature masks and categories, a similarity coefficient between the first feature mask set and the feature mask corresponding to each category is calculated, and a category corresponding to the first category label cj corresponding to the maximum similarity coefficient is determined as a target category of the target object.

In the embodiment of the present invention, the image processing apparatus may obtain a target image to be processed, input the target image to be processed into a target detection model for detection, input the target image into a robust feature extraction model to extract a first feature mask set corresponding to each component of the target object in the target image, determine a second feature mask set corresponding to the target class according to a preset correspondence between feature masks and classes, and calculate a similarity coefficient between the first feature mask set and the second feature mask set, so as to correct the target class of the target object in the target image according to the similarity coefficient. By the method for carrying out classification correction on the target object in the attacked target image based on the robust features, difficulty in resisting attack cracking is increased, an attacker not only needs to change the prediction type of the model but also needs to change each robust feature, meanwhile, the original deep neural network model can be continuously used for predicting the target image which is not attacked, high prediction accuracy is kept by using the non-robust features, and efficiency and accuracy of image processing are effectively improved.

The embodiment of the invention also provides an image processing device, which is used for executing the unit of the method in any one of the preceding claims. Specifically, referring to fig. 2, fig. 2 is a schematic block diagram of an image processing apparatus according to an embodiment of the present invention. The image processing apparatus of the present embodiment includes: an acquisition unit 201, a detection unit 202, an extraction unit 203, a determination unit 204, and a correction unit 205.

An obtaining unit 201, configured to obtain a target image to be processed, where the target image includes a target object;

a detection unit 202, configured to input the target image into a target detection model for detection, so as to identify a target detection frame and a target category corresponding to the target object from the target image;

an extracting unit 203, configured to input the target image into a robust feature extraction model to extract a first feature mask set corresponding to each component of the target object in the target image;

a determining unit 204, configured to determine, according to a preset correspondence between feature masks and categories, a second feature mask set corresponding to the target category, and calculate a similarity coefficient between the first feature mask set and the second feature mask set;

a correcting unit 205, configured to perform correction processing on the target class of the target object in the target image according to the similarity coefficient.

Further, before the detection unit 202 inputs the target image into a target detection model for detection, the detection unit is further configured to:

Further, before the extracting unit 203 inputs the target image into the robust feature extraction model, it is further configured to:

determining components of the target object in each sample image;

Further, when the extracting unit 203 inputs the target image into a robust feature extraction model to extract a first feature mask set corresponding to each component of the target object in the target image, specifically, the extracting unit is configured to:

Further, when the correcting unit 205 performs the correction processing on the target class of the target object in the target image according to the similarity coefficient, specifically, the correcting unit is configured to:

Further, when the rectification unit 205 performs the rectification processing on the target class of the target object in the target image, it is specifically configured to:

Further, when the determining unit 204 calculates the similarity coefficient between the first feature mask set and the second feature mask set, it is specifically configured to:

Referring to fig. 3, fig. 3 is a schematic block diagram of a server according to an embodiment of the present invention. The server in this embodiment as shown in the figure may include: one or more processors 301; one or more input devices 302, one or more output devices 303, and memory 304. The processor 301, the input device 302, the output device 303, and the memory 304 are connected by a bus 305. The memory 304 is used for storing computer programs, including programs, and the processor 301 is used for executing the programs stored in the memory 304. Wherein the processor 301 is configured to invoke the program to perform:

Further, before the processor 301 inputs the target image into a target detection model for detection, the processor is further configured to:

Further, before the processor 301 inputs the target image into the robust feature extraction model, it is further configured to:

determining components of the target object in each sample image;

Further, when the processor 301 inputs the target image into a robust feature extraction model to extract a first feature mask set corresponding to each component of the target object in the target image, specifically, the processor is configured to:

Further, when the processor 301 performs the rectification processing on the target class of the target object in the target image according to the similarity coefficient, specifically, the processor is configured to:

Further, when the processor 301 performs the rectification processing on the target class of the target object in the target image, specifically, the processor is configured to:

Further, when the processor 301 calculates the similarity coefficient between the first feature mask set and the second feature mask set, it is specifically configured to:

It should be understood that, in the embodiment of the present invention, the Processor 301 may be a Central Processing Unit (CPU), and the Processor may also be other general processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The input device 302 may include a touch pad, a microphone, etc., and the output device 303 may include a display (LCD, etc.), a speaker, etc.

The memory 304 may include a read-only memory and a random access memory, and provides instructions and data to the processor 301. A portion of the memory 304 may also include non-volatile random access memory. For example, the memory 304 may also store device type information.

In a specific implementation, the processor 301, the input device 302, and the output device 303 described in this embodiment of the present invention may execute the implementation described in the method embodiment shown in fig. 2 provided in this embodiment of the present invention, and may also execute the implementation of the image processing device described in fig. 3 in this embodiment of the present invention, which is not described again here.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the image processing method described in the embodiment corresponding to fig. 2 is implemented, and the image processing apparatus according to the embodiment corresponding to fig. 3 of the present invention may also be implemented, which is not described herein again.

The computer readable storage medium may be an internal storage unit of the image processing apparatus according to any of the foregoing embodiments, for example, a hard disk or a memory of the image processing apparatus. The computer-readable storage medium may also be an external storage device of the image processing apparatus, such as a plug-in hard disk provided on the image processing apparatus, a Smart Memory Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the image processing apparatus. The computer-readable storage medium is used for storing the computer program and other programs and data required by the image processing apparatus. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a computer-readable storage medium, which includes several instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned computer-readable storage media comprise: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only a part of the embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims

1. An image processing method, comprising:

inputting the target image into a robust feature extraction model to extract a first feature mask set corresponding to each component of the target object in the target image, wherein the first feature mask set comprises one or more feature masks, and each feature mask is composed of numbers and is used for indicating robust features of each component of the target object;

the inputting the target image into a robust feature extraction model to extract a first feature mask set corresponding to each component of the target object in the target image includes:

extracting the first feature mask set corresponding to the pixel coverage area of each component of the target object;

determining a second feature mask set corresponding to the target category according to a preset corresponding relation between feature masks and categories, and calculating a similarity coefficient between the first feature mask set and the second feature mask set, wherein the second feature mask set comprises one or more feature masks;

2. The method of claim 1, wherein before inputting the target image into a target detection model for detection, the method comprises:

3. The method of claim 2, wherein before inputting the target image into a robust feature extraction model, further comprising:

determining components of the target object in each sample image;

4. The method according to claim 1, wherein the performing the rectification processing on the target class of the target object in the target image according to the similarity coefficient comprises:

5. The method according to claim 4, wherein the performing of the rectification processing on the target class of the target object in the target image comprises:

6. The method of claim 1, wherein the calculating a similarity coefficient between the first set of feature masks and the second set of feature masks comprises:

7. An image processing apparatus characterized by comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a target image to be processed, and the target image comprises a target object;

the detection unit is used for inputting the target image into a target detection model for detection so as to identify a target detection frame and a target category corresponding to the target object from the target image;

an extracting unit, configured to input the target image into a robust feature extraction model to extract a first feature mask set corresponding to each component of the target object in the target image, where the first feature mask set includes one or more feature masks, and each feature mask is composed of numbers and is used to indicate a robust feature of each component of the target object;

the extraction unit is configured to, when inputting the target image into a robust feature extraction model to extract a first feature mask set corresponding to each component of the target object in the target image, specifically, input the target image into the robust feature extraction model to determine a pixel coverage area of each component of the target object in the target image; extracting the first feature mask set corresponding to the pixel coverage area of each component of the target object;

a determining unit, configured to determine a second feature mask set corresponding to the target category according to a preset correspondence between feature masks and categories, and calculate a similarity coefficient between the first feature mask set and the second feature mask set, where the second feature mask set includes one or more feature masks;

and the correcting unit is used for correcting the target class of the target object in the target image according to the similarity coefficient.

8. A server, comprising a processor, an input device, an output device, and a memory, the processor, the input device, the output device, and the memory being interconnected, wherein the memory is configured to store a computer program, the computer program comprising a program, the processor being configured to invoke the program to perform the method of any of claims 1-6.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method of any one of claims 1-6.