CN111652824A

CN111652824A - Image processing method and device and network training method

Info

Publication number: CN111652824A
Application number: CN202010623842.8A
Authority: CN
Inventors: 张发恩; 张建伟
Original assignee: Ainnovation Nanjing Technology Co ltd
Current assignee: Ainnovation Nanjing Technology Co ltd
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2020-09-11

Abstract

The application provides an image processing method and device and a network training method. The method comprises the following steps: acquiring an original image of an object to be identified; processing the original image by using a preset interference shielding network to filter an interference background which is irrelevant to the object to be identified in the original image, and obtaining a filtered image; and processing the filtered image by using a preset object identification network to obtain an identification result of the object to be identified. It can be understood that before the object to be identified in the image is identified, an interference background irrelevant to the object to be identified in the image is filtered by a preposed interference shielding network, so that the identification can be prevented from being interfered by the background when the filtered image is processed by the object identification network, and the identification accuracy is greatly improved.

Description

Image processing method and device and network training method

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method, an image processing device, and a network training method.

Background

Among the applications of artificial intelligence technology in various fields, the identification of objects in images by artificial intelligence technology is the most common and widespread application. However, in the process of recognition, background elements irrelevant to the object to be recognized often exist in the image, and the background elements interfere with the recognition of the object, so that the accuracy of the recognition result is not high.

Disclosure of Invention

An object of the embodiments of the present application is to provide an image processing method, an image processing apparatus, and a network training method, so as to avoid interference of a background on object recognition and improve recognition accuracy.

In a first aspect, an embodiment of the present application provides an image processing method, where the method includes: acquiring an original image of an object to be identified; processing the original image by using a preset interference shielding network to filter an interference background which is irrelevant to the object to be identified in the original image, and obtaining a filtered image; and processing the filtered image by using a preset object identification network to obtain an identification result of the object to be identified.

In the embodiment of the application, before the object to be recognized in the image is recognized, the interference background irrelevant to the object to be recognized in the image is filtered by a preposed interference shielding network, so that the recognition can be prevented from being interfered by the background when the filtered image is processed by the object recognition network, and the recognition accuracy is greatly improved.

With reference to the first aspect, in a first possible implementation manner, processing the original image by using a preset interference shielding network to filter an interference background, which is irrelevant to the object to be identified, in the original image, so as to obtain a filtered image, where the processing includes: performing convolution processing on the original image for multiple times by using the interference shielding network to obtain a characteristic image for distinguishing the object to be identified and the interference background; and filtering the interference background in the characteristic image to obtain the filtered image.

In the embodiment of the application, which features in the image belong to the object to be recognized and which belong to the interference background are clearly distinguished through convolution processing, so that the interference background is comprehensively filtered to obtain the image only with the object to be recognized left.

With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner, performing convolution processing on the original image for multiple times by using the interference shielding network to obtain a feature image for distinguishing the object to be identified and the interference background includes: sequentially performing convolution and activation processing with reduced characteristic scale on the original image by using the interference shielding network to obtain initial convolution characteristics; sequentially performing convolution and activation processing with unchanged characteristic scale on the initial convolution characteristic by using the interference shielding network to obtain a first convolution characteristic; sequentially carrying out feature scale reduction convolution and activation processing on the initial convolution features by using the interference shielding network to obtain reduced convolution features; the interference shielding network is further utilized to carry out up-sampling processing on the reduced convolution characteristics to obtain second convolution characteristics with the characteristic scale being the same as that of the first convolution characteristics; and fusing the first convolution characteristic and the second convolution characteristic by using the interference shielding network to obtain the characteristic image.

In the embodiment of the application, the first convolution feature and the second convolution feature with different feature structures are obtained through the positive convolution with different scales, so that the visual field of the features can be increased, and the interference background can be extracted more accurately by fusing the first convolution feature and the second convolution feature. In addition, activation processing is carried out on the basis of convolution, so that processing is nonlinear, and the interference background can be extracted more accurately.

With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner, the fusing the first convolution feature and the second convolution feature by using the interference shielding network to obtain the feature image includes: fusing the first convolution characteristic and the second convolution characteristic by using the interference shielding network to obtain a fused characteristic; and performing feature scale invariant convolution and classification processing on the fusion features by using the interference shielding network to obtain the feature images.

In the embodiment of the application, the first feature data belonging to the interference background can be further extracted by performing positive convolution after fusion, so that the interference background can be more accurately extracted.

With reference to the first possible implementation manner of the first aspect, in a fourth possible implementation manner, performing convolution processing on the original image for multiple times by using the interference shielding network to obtain a feature image for distinguishing the object to be identified and the interference background includes: sequentially performing convolution and activation processing with reduced characteristic scale on the original image by using the interference shielding network to obtain initial convolution characteristics; sequentially performing convolution and activation processing with unchanged characteristic scale on the initial convolution characteristic by using the interference shielding network to obtain a first convolution characteristic; sequentially carrying out feature scale reduction convolution and activation processing on the initial convolution features by using the interference shielding network to obtain reduced convolution features; sequentially carrying out convolution and activation processing with unchanged characteristic scale on the reduced convolution characteristics by utilizing the interference shielding network to obtain second convolution characteristics; sequentially carrying out feature scale reduction convolution and activation processing on the reduced convolution features by utilizing the interference shielding network to obtain re-reduced convolution features; sequentially performing convolution, activation processing and up-sampling processing with unchanged characteristic scale on the convolution features which are reduced again by using the interference shielding network to obtain third convolution features with the same characteristic scale as the second convolution features; fusing the second convolution characteristic and the third convolution characteristic by using the interference shielding network to obtain a fused characteristic; then, the interference shielding network is utilized to carry out up-sampling processing on the fused features, and fused features with the feature scale the same as that of the first convolution features are obtained; and the interference shielding network fuses the fusion feature and the first convolution feature to obtain the feature image.

In the embodiment of the application, the first convolution feature, the second convolution feature and the third convolution feature with different feature structures are obtained through the positive convolution with different scales, so that the visual field of the features can be increased, and the interference background can be more accurately extracted by fusing the first convolution feature, the second convolution feature and the third convolution feature for multiple times. In addition, activation processing is carried out on the basis of convolution, so that processing is nonlinear, and the interference background can be extracted more accurately.

With reference to the first possible implementation manner of the first aspect, in a fifth possible implementation manner, performing convolution processing on the original image for multiple times by using the interference shielding network to obtain a feature image for distinguishing the object to be identified and the interference background includes: sequentially performing convolution and activation processing with reduced characteristic scale on the original image by using the interference shielding network to obtain initial convolution characteristics; and sequentially performing convolution and classification processing with unchanged characteristic scale on the initial convolution characteristics by using the interference shielding network to obtain the characteristic image.

In the embodiment of the application, the characteristic image can be quickly obtained through linear convolution processing of a single thread.

With reference to the first possible implementation manner of the first aspect, in a sixth possible implementation manner, the filtering the interference background in the feature image to obtain a filtered image includes: utilizing the interference shielding network to perform up-sampling processing on the characteristic image to obtain an amplified image with the same scale as the original image; and adjusting the pixel value of the pixel point belonging to the interference background in the amplified image to a preset value to obtain the filtered image.

In the embodiment of the application, the interference background and the object to be identified can be rapidly distinguished by adjusting the pixel value of the pixel point of the interference background to the preset value, so that the interference background is rapidly filtered.

With reference to the first possible implementation manner of the first aspect, in a sixth possible implementation manner, the filtering the interference background in the feature image to obtain a filtered image includes: utilizing the interference shielding network to perform up-sampling processing on the characteristic image to obtain an amplified image with the same scale as the original image; extracting a graph of the object to be recognized from the amplified image; and adding the graph into a preset white board image to obtain the filtered image.

In the embodiment of the application, the image of the object to be identified is extracted and placed into the white board image, so that the filtered image has no interference background, and the interference background is thoroughly filtered.

In a second aspect, an embodiment of the present application provides a method for training a network, where the method includes:

acquiring a first training image set and a second training image set, wherein an object to be identified and an interference background in each training image in the first training image set are labeled; each training image in the second training image set is marked with information of an object to be identified in the training image; training a preset interference shielding network by using the first training image set to obtain a preliminarily trained interference shielding network, wherein the interference shielding network is used for filtering an interference background in the image except for the object to be identified; and training the preliminarily trained interference shielding network and a preset object recognition network by utilizing a second training image set so as to obtain the trained interference shielding network and the trained object recognition network, wherein the object recognition network is used for recognizing the information of the object to be recognized in the image.

In the embodiment of the application, the interference shielding network is pre-trained a small amount by the first training image set, so that the network is easier to converge and more stably removes interference when the second training image is used for training.

In a third aspect, an embodiment of the present application provides an apparatus for processing an image, where the apparatus includes: the image acquisition module is used for acquiring an image of an object to be identified; the image processing module is used for processing the image by utilizing a preset interference shielding network so as to filter an interference background which is irrelevant to the object to be identified in the image and obtain a filtered image; and processing the filtered image by using a preset object identification network to obtain an identification result of the object to be identified.

In a fourth aspect, the present application provides a computer-readable storage medium having a computer-executable non-volatile program code, where the program code causes the computer to execute the image processing method according to the first aspect and any one of the possible implementation manners of the first aspect, or execute the network training method according to the second aspect.

In a fifth aspect, an embodiment of the present application provides an electronic device, including: the device comprises a communication interface, a memory and a processor connected with the communication interface and the memory; the memory is used for storing programs; the processor is configured to call and run the program to perform the image processing method according to the first aspect and any one of the possible implementation manners of the first aspect, or to perform the network training method according to the second aspect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a flowchart of a network training method according to an embodiment of the present disclosure;

fig. 2 is a flowchart of an image processing method according to an embodiment of the present disclosure;

fig. 3 is a first network structure diagram of an interference shielding network in an image processing method according to an embodiment of the present application;

fig. 4 is a second network structure diagram of an interference shielding network in an image processing method according to an embodiment of the present application;

fig. 5 is a third network structure diagram of an interference shielding network in an image processing method according to an embodiment of the present application;

fig. 6 is a block diagram of an electronic device according to an embodiment of the present disclosure;

fig. 7 is a block diagram of an image processing apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Referring to fig. 1, an embodiment of the present application provides a network training method, where the network training method may be executed by an electronic device, where the electronic device may be a terminal or a server, and a flow of the network training method may include:

step S100: acquiring a first training image set and a second training image set, wherein an object to be identified and an interference background in each training image in the first training image set are labeled; and marking the information of the object to be identified in the training image on each training image in the second training image set.

Step S200: and training a preset interference shielding network by using the first training image set to obtain a preliminarily trained interference shielding network, wherein the interference shielding network is used for filtering an interference background in the image except for the object to be identified.

Step S300: and training the preliminarily trained interference shielding network and a preset object recognition network by using a second training image set so as to obtain the trained interference shielding network and the trained object recognition network, wherein the object recognition network is used for recognizing the information of the object to be recognized in the image.

The above training process will be described in detail with reference to the application scenario.

Since the network of this embodiment includes the interference shielding network and the object recognition network of the main body, the electronic device needs to prepare two training sets in advance for training.

In this embodiment, since the first training image set is used for preliminary pre-training of the interference shielding network, the number of training images in the first training image set may be slightly smaller. For example, the training images in the first set of training images may have 1000 to 2000 images. In addition, in order to achieve a better pre-training effect, the object to be recognized and the interference background in each training image in the first training image are labeled. For example, in an application scenario of vehicle recognition, a vehicle in a training image is an object to be recognized, and elements other than the vehicle in the training image are interfering objects. Therefore, the pixel point in the region where the vehicle is located in the training image can be labeled as 1 to realize labeling of the object to be recognized, and the pixel point in the region outside the vehicle can be labeled as 0 to realize labeling of the interference background.

In this embodiment, the second training image set is used for performing a comprehensive training on the interference shielding network and the object recognition network, so the number of the training images in the second training image set may be larger than that of the second training images. For example, the second set of training images may have 5000-10000 training images. In addition, in order to achieve a better pre-training effect, each training image in the second training image is labeled with information of an object to be recognized in the training image. Of course, the information of the object to be recognized is different according to the application scene. For example, if the application scenario is to identify the type of a vehicle, the information of the labeled object to be identified is the type of the vehicle, and different vehicle types are represented by labeling different numerical values; if the application scene is to identify the license plate of the vehicle, the marked information of the object to be identified is the license plate number of the vehicle; if the application scene is the expression recognition of the character, the information of the marked object to be recognized is the expression type of the character, and different expression types are represented by marking different numerical values.

After obtaining the first training image set, the electronic device may first pre-train the interference shielding network preset by the electronic device using the first training image set.

It can be understood that the process of the electronic device using the first training image set to train the interference shielding network is: the electronic equipment processes each training image in the first training image set by using the interference shielding network, and then optimizes the interference shielding network according to the difference between the processing result of each training image and the corresponding label. In other words, the pre-training of the interference shielding network is to perform image processing + optimization iteratively. To facilitate understanding of the principle of pre-training, the present embodiment is described by taking an example in which the electronic device processes a certain training image in the first training image set by using the interference shielding network, and then optimizes the interference shielding network by using the processing result of the training image.

Specifically, the electronic device inputs the training image into an interference shielding network, and the interference shielding network can determine the probability that each pixel point in the training image belongs to the interference background or the object to be identified through the processing of the training image by the interference shielding network, wherein the probability can be represented by a numerical value between 0 and 1, the more the numerical value tends to 1, the higher the probability that the pixel point corresponding to the numerical value belongs to the object to be identified is, and the more the numerical value tends to 0, the higher the probability that the pixel point corresponding to the numerical value belongs to the interference background is. And the value of each pixel point in the training image, which belongs to the interference background or is an object to be identified, is output as a processing result of the interference shielding network by the interference shielding network, so that after the electronic equipment obtains the processing result output by the interference shielding network, the electronic equipment can determine the difference between the processing result of the training image and the label of the training image, namely determine the Loss between the value of each pixel point in the training image and the value of the label of the pixel point. And finally, the electronic equipment optimizes the convolution parameters in the interference shielding network by utilizing a Loss function such as cross entropy function to carry out back propagation based on the Loss of each pixel point in the training image, so that the training optimization of the interference shielding network is realized.

And the interference shielding network is optimized through continuous iterative training, and the interference shielding network determines that the numerical value of each pixel point in the training image is closer and closer to the numerical value marked by the pixel point, so that the accuracy of extracting the interference background by the interference shielding network is higher and higher.

After preliminary pre-training of the interference shielding network is completed, the preliminarily trained interference shielding network already has certain accuracy, and then the electronic equipment trains the preliminarily trained interference shielding network and the object recognition network by utilizing the second training image set, so that rapid formation convergence can be realized.

It can be understood that the process of the electronic device using the second training image set to train the preliminarily trained interference shielding network and the object recognition network is to: the electronic equipment processes each training image in the second training image set by using the preliminarily trained interference shielding network and the object recognition network, and optimizes the preliminarily trained interference shielding network and the object recognition network according to the difference between the processing result of each training image and the corresponding label until the two networks form convergence. In other words, the training of the preliminarily trained interference shielding network and the object recognition network is also to iteratively perform the image processing + optimization. Also for facilitating understanding of the principle of the pre-training, the embodiment takes as an example that the electronic device processes one of the training images in the second training image set by using the preliminarily trained interference shielding network and the object recognition network, and then optimizes the preliminarily trained interference shielding network and the object recognition network by using the processing result of the one training image.

Specifically, the electronic device inputs the training image into the preliminarily trained interference shielding network, and after the processing of the preliminarily trained interference shielding network, the electronic device obtains an image with the interference background filtered. Then, the electronic device inputs the image after the interference background is filtered into an object identification network for processing, so as to obtain an identification result of the object to be identified in the image after the interference background is filtered. And finally, the electronic equipment determines the Loss between the recognition result of the object to be recognized and the pre-marked information of the object to be recognized, and optimizes the convolution parameters of the preliminarily trained interference shielding network and the convolution parameters in the object recognition network by utilizing the Loss back propagation, thereby realizing the training optimization of the two networks.

And the two networks are optimized through continuous iterative training, so that the probability that the information of the object to be recognized, which is recognized by the object recognition network, is the same as the labeled information is higher and higher. When the probability that the information of the object to be recognized and the labeled information recognized by the object recognition network are the same exceeds a threshold value, for example, exceeds 98%, the two networks are considered to form convergence, and the training process is ended.

After the training of the network is finished, the network can be put into practical application to process the image.

Referring to fig. 2, in favor of the two networks after the aforementioned training, an embodiment of the present application provides an image processing method, where the image processing method may also be executed by an electronic device, and a flow of the image processing method may include:

step S101: and acquiring an original image of the object to be identified.

Step S201: and processing the original image by using a preset interference shielding network so as to filter an interference background which is irrelevant to the object to be identified in the original image and obtain a filtered image.

Step S301: and processing the filtered image by using a preset object identification network to obtain an identification result of the object to be identified.

The above-described process flow will be described in detail below.

Step S101: and acquiring an original image of the object to be identified.

The electronic device may obtain the original image in different ways according to different application scenarios. For example, in a local area network scenario, the electronic device may obtain an original image in a user input manner; for another example, in an internet scenario, the electronic device may crawl the original image directly from the internet.

In this embodiment, the network structure of the interference shielding network is different according to different actual requirements. For example, if the interference background filtering needs to be performed quickly and efficiently, the interference shielding network may be a single-channel network structure; for example, if more accurate interference background filtering is required, the interference shielding network may be a multi-channel network structure. Therefore, the electronic equipment can perform convolution processing on the original image for multiple times by using a single-channel or multi-channel network structure of the interference shielding network so as to obtain a characteristic image for distinguishing the object to be identified and the interference background.

As one mode, as shown in fig. 3, if a plurality of interference shielding networks adopt a single-channel network structure, the electronic device may sequentially perform convolution processing on the original image for a plurality of times by using the single-channel network structure of the interference shielding network.

Specifically, the electronic device may sequentially perform convolution with reduced feature scale and activation processing on the original image by using the interference shielding network to obtain an initial convolution feature. For example, the electronic device convolves the original image with a convolution kernel of 3 × 3 in the interference shielding network by a step size of 2, to obtain the feature after the first convolution. And the electronic equipment utilizes the ReLu activation function in the interference shielding network to activate the features after the first convolution, so as to obtain the features after the first activation. Then, the electronic device performs convolution on the feature after the first activation processing by using a convolution kernel of 3 × 3 in the interference shielding network with the step length of 2 to obtain a feature after the second convolution. And finally, the electronic equipment activates the features after the second convolution by using a ReLu activation function in the interference shielding network so as to obtain the features after the second activation, wherein the features after the second activation are the initial convolution features.

After the initial convolution characteristics are obtained, the electronic equipment continues to utilize the interference shielding network to sequentially carry out convolution with reduced characteristic scale and classification processing on the initial convolution characteristics, and therefore characteristic images are obtained. For example, the electronic device convolves the initial convolution characteristic with step size of 1 by using a convolution kernel of 3 × 3 in the interference shielding network, so as to obtain a characteristic after the third convolution. And the electronic equipment utilizes the ReLu activation function in the interference shielding network to activate the features after the third convolution, so as to obtain the features after the third activation. And then, continuously utilizing a convolution kernel of 1 x 1 in the interference shielding network by the electronic equipment to convolve the features after the third activation processing by using the step length of 1, and obtaining the features after the fourth convolution. And the electronic equipment classifies the features after the fourth convolution by using a Softmaxf classifier in the interference shielding network, so that the numerical value of whether each pixel point in the image belongs to the object to be identified or the interference background can be obtained. Finally, the electronic device classifies the pixel points with the numerical values greater than the preset numerical values in the image as the objects to be identified by using the preset numerical values, such as 0.8, and classifies the pixel points with the numerical values less than or equal to the preset numerical values as the interference backgrounds, so that the characteristic image for distinguishing the objects to be identified and the interference backgrounds is obtained.

As one mode, as shown in fig. 4, when a plurality of interference shielding networks have a two-channel network structure, the electronic device can perform multiple convolution processes on the original image in parallel by using the two-channel network structure of the interference shielding network.

Specifically, the electronic device also performs convolution and activation processing with reduced feature scale on the original image in sequence by using the interference shielding network to obtain an initial convolution feature. For example, the electronic device also convolutes the original image by using a convolution kernel of 3 × 3 in the interference shielding network with a step size of 2 to obtain the feature after the first convolution. And the electronic equipment utilizes the ReLu activation function in the interference shielding network to activate the features after the first convolution, so as to obtain the features after the first activation. Then, the electronic device performs convolution on the feature after the first activation processing by using a convolution kernel of 3 × 3 in the interference shielding network with the step length of 2 to obtain a feature after the second convolution. And finally, the electronic equipment activates the features after the second convolution by using a ReLu activation function in the interference shielding network so as to obtain the features after the second activation, wherein the features after the second activation are the initial convolution features.

After the initial convolution characteristics are obtained, the electronic equipment can utilize two channels of the interference shielding network to process the initial convolution characteristics in parallel, and then the processed characteristics are fused.

In an exemplary aspect, the electronic device may sequentially perform feature scale invariant convolution and activation processing on the initial convolution feature by using an interference shielding network to obtain a first convolution feature. For example, the electronic device convolves the initial convolution characteristic with step size of 1 by using a convolution kernel of 1 × 1 in the interference shielding network, and obtains the characteristic after the third convolution. And the electronic equipment utilizes a ReLu activation function in the interference shielding network to activate the feature after the third convolution to obtain the feature after the third activation, wherein the feature after the third activation is the first convolution feature.

On the other hand, the electronic device sequentially performs feature scale reduction convolution and activation processing on the initial convolution feature by using the interference shielding network to obtain the reduced convolution feature. For example, the electronic device convolves the initial convolution characteristic with a step size of 2 by using a convolution kernel of 3 × 3 in the interference shielding network, so as to obtain a characteristic after the fourth convolution. And the electronic equipment utilizes the ReLu activation function in the interference shielding network to activate the feature after the fourth convolution, so as to obtain the feature after the fourth activation. And then, the electronic equipment convolutes the feature after the fourth activation processing by using a convolution kernel of 1 x 1 in the interference shielding network with the step size of 1 to obtain the feature after the fifth convolution. And the electronic equipment performs activation processing on the feature after the fifth convolution by using a ReLu activation function in the interference shielding network to obtain the feature after the fifth activation processing, wherein the feature after the fifth activation processing is the reduced convolution feature. After the reduced convolution characteristics are obtained, the electronic device sequentially performs upsampling processing on the reduced convolution characteristics by using an interference shielding network, for example, performs upsampling processing by using a difference method, so as to obtain second convolution characteristics with the same characteristic scale as the first convolution characteristics.

In this embodiment, after obtaining the first convolution feature and the second convolution feature, the electronic device may fuse the first convolution feature and the second convolution feature by using the interference shielding network, so as to obtain the feature image.

As an exemplary manner of obtaining the feature image through fusion, if the feature image needs to be efficiently generated, the electronic device may fuse the first convolution feature and the second convolution feature by using a Sum function in the interference shielding network, and directly perform classification processing on the fused feature by using a classifier of the interference shielding network, such as a Softmaxf classifier, after the fusion, so as to obtain the feature image.

As another exemplary way to obtain the feature image through fusion, if the interference background needs to be filtered more accurately, after the electronic device fuses the first convolution feature and the second convolution feature by using the interference shielding network, the electronic device may further perform convolution processing with unchanged feature scale on the fused feature by using the interference shielding network. For example, the electronic device convolves the fused features with step length of 1 by 1 using a convolution kernel of 1 × 1 in the interference shielding network, and obtains features after sixth convolution. Then, the electronic device performs classification processing on the features after the sixth convolution by using a classifier of the interference shielding network, such as a Softmaxf classifier, so as to obtain a feature image.

As one mode, as shown in fig. 5, a plurality of interference shielding networks adopt a three-channel network structure, and then the electronic device may also perform multiple convolution processes on the original image in parallel by using the three-channel network structure of the interference shielding network.

After the initial convolution characteristics are obtained, the electronic equipment can utilize three channels of the interference shielding network to process the initial convolution characteristics in parallel, and then the processed characteristics are fused.

In an exemplary manner, in the first aspect, the electronic device may sequentially perform feature scale invariant convolution and activation processing on the initial convolution feature by using the interference shielding network to obtain the first convolution feature. For example, the electronic device convolves the initial convolution characteristic with step size of 1 by using a convolution kernel of 1 × 1 in the interference shielding network, and obtains the characteristic after the third convolution. And the electronic equipment utilizes a ReLu activation function in the interference shielding network to activate the feature after the third convolution to obtain the feature after the third activation, wherein the feature after the third activation is the first convolution feature.

In a second aspect, the electronic device sequentially performs feature scale reduction convolution and activation processing on the initial convolution feature by using an interference shielding network to obtain a reduced convolution feature. For example, the electronic device convolves the initial convolution characteristic with a step size of 2 by using a convolution kernel of 3 × 3 in the interference shielding network, so as to obtain a characteristic after the fourth convolution. And the electronic equipment utilizes the ReLu activation function in the interference shielding network to activate the feature after the fourth convolution to obtain the feature after the fourth activation, wherein the feature after the fourth activation is the reduced convolution feature. Then, the electronic device continues to perform feature scale invariant convolution and activation processing on the reduced convolution features by using the interference shielding network, so as to obtain second convolution features. For example, the electronic device convolves the reduced convolution characteristic with a step size of 1 by 1 using a convolution kernel of 1 × 1 in the interference shielding network, and obtains a characteristic after the fifth convolution. And the electronic equipment utilizes a ReLu activation function in the interference shielding network to activate the feature after the fifth convolution to obtain the feature after the fifth activation, wherein the feature after the fifth activation is the second convolution feature.

In a third aspect, after obtaining the reduced convolution characteristic, the electronic device further performs convolution with reduced characteristic scale and activation processing on the reduced convolution characteristic in sequence by using an interference shielding network, so as to obtain a reduced convolution characteristic again. For example, the electronic device convolves the reduced convolution characteristic with a step size of 2 by using a convolution kernel of 3 × 3 in the interference shielding network, so as to obtain a characteristic after the sixth convolution. And the electronic equipment utilizes the ReLu activation function in the interference shielding network to activate the features after the sixth convolution to obtain the features after the sixth activation, wherein the features after the sixth activation are the convolution features which are reduced again. Then, the electronic device sequentially performs convolution, activation processing and up-sampling processing on the convolution feature which is reduced again by using the interference shielding network, so as to obtain a third convolution feature with the same feature scale as the second convolution feature. For example, the electronic device convolves the again reduced convolution feature with step size of 1 by using a convolution kernel of 1 × 1 in the interference shielding network, so as to obtain a feature after the seventh convolution. And the electronic equipment utilizes the ReLu activation function in the interference shielding network to activate the features after the seventh convolution, so as to obtain the features after the seventh activation. And the electronic equipment performs interpolation up-sampling processing on the feature after the seventh activation processing by using the interference shielding network to obtain a third convolution feature.

In this embodiment, after obtaining the first convolution feature, the second convolution feature, and the third convolution feature, the electronic device fuses the second convolution feature and the third convolution feature first by using a Sum function of the interference shielding network, so as to obtain the fused features. Then, the electronic device performs interpolation up-sampling processing on the fused features by using an interference shielding network, so as to obtain the fused features with the same feature scale as the first convolution features. And finally, the electronic equipment fuses the fusion feature and the first convolution feature by using the Sum function of the interference shielding network, so as to obtain a feature image.

In this embodiment, as an exemplary manner of obtaining a feature image through fusion, if it is required to generate a feature image efficiently, after the electronic device fuses the first convolution feature and the fusion feature by using a Sum function in the interference shielding network, a classifier of the interference shielding network, such as a Softmaxf classifier, is directly used to perform classification processing on the finally fused feature, so as to obtain the feature image.

As another exemplary way to obtain the feature image through fusion, if the interference background needs to be filtered more accurately, after the electronic device fuses the first convolution feature and the fusion feature by using the interference shielding network, the electronic device also performs convolution processing with unchanged feature scale on the finally fused feature by using the interference shielding network to obtain the feature after the eighth convolution. Then, the electronic device performs classification processing on the features after the eighth convolution by using a classifier of the interference shielding network, such as a Softmax classifier, so as to obtain a feature image.

In this embodiment, after the feature image is obtained, since the feature image is only the interference background and the object to be recognized are distinguished, the interference background is still included in the feature image. Therefore, the electronic device needs to filter the interference background in the feature image, so as to obtain a filtered image.

As an exemplary way to filter out the interfering background, the electronic device filters out the interfering background directly in the feature image.

Specifically, due to the convolution processing performed for multiple times, the scale of the finally obtained feature image is smaller than that of the original image, for example, 1/4 of the original image, so that in order to facilitate the processing of the subsequent object identification network, when the interference background is filtered out, the electronic device continues to perform the upsampling processing of interpolating the feature image by using the interference shielding network, so as to obtain an enlarged image with the same scale as that of the original image. After the amplified image is obtained, the electronic device adjusts the pixel values of the pixel points belonging to the interference background in the amplified image to preset values, such as 0, so as to filter the interference background to obtain a filtered image.

As another exemplary way to filter out the interference background, the electronic device may filter out the interference background by way of matting.

Specifically, because of the convolution processing continuously for multiple times, the scale of the finally obtained feature image is smaller than that of the original image, for example, 1/4 of the original image, so that for the convenience of processing of a subsequent object identification network, before matting, the electronic device performs upsampling processing of interpolation on the feature image by using an interference shielding network to obtain an enlarged image with the same scale as the original image. After obtaining the amplified image, the electronic device extracts the graph of the object to be recognized from the amplified image. And finally, the electronic equipment adds the graph of the object to be identified to a preset white board image with the same scale as the original image, so that the interference background is filtered, and the filtered image is obtained.

In this embodiment, the object recognition network may be a conventional neural network such as a CNN network, an R-CNN network, a U-net network, or the like. And after obtaining the filtered image, the object identification network correspondingly identifies the information of the object to be identified in the filtered image and outputs the information. Thus, the electronic device can obtain the recognition result.

Referring to fig. 6, based on the same inventive concept, the present embodiment provides an electronic device 10, and the electronic device 10 may include a communication interface 11 connected to a network, one or more processors 12 for executing program instructions, a bus 13, and a memory 14 in different forms, such as a disk, a ROM, or a RAM, or any combination thereof. Illustratively, the computer platform may also include program instructions stored in ROM, RAM, or other types of non-transitory storage media, or any combination thereof.

The memory 14 is used for storing programs, and the processor 12 is used for calling and running the programs in the memory 14 to execute the aforementioned training method of the network or the processing method of the image.

Referring to fig. 7, based on the same inventive concept, an embodiment of the present application further provides an apparatus 100 for processing an image, where the apparatus 100 for processing an image is applied to an electronic device, and the apparatus 100 for processing an image includes:

the image acquisition module 110 is configured to acquire an original image of an object to be identified;

the image processing module 120 is configured to process the original image by using a preset interference shielding network, so as to filter an interference background in the original image, where the interference background is irrelevant to the object to be identified, and obtain a filtered image; and processing the filtered image by using a preset object identification network to obtain an identification result of the object to be identified.

It should be noted that, as those skilled in the art can clearly understand, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Some embodiments of the present application further provide a computer readable storage medium of a computer executable nonvolatile program code, which can be a general-purpose storage medium, such as a removable disk, a hard disk, or the like, and the computer readable storage medium stores the program code, which when executed by a computer, executes the steps of the training method or the image processing method of the network according to any of the above embodiments.

The program code product of the network training method or the image processing method provided in the embodiment of the present application includes a computer-readable storage medium storing program codes, and instructions included in the program codes may be used to execute the methods in the foregoing method embodiments, and specific implementations may refer to the method embodiments and are not described herein again.

In summary, before the object to be recognized in the image is recognized, an interference background irrelevant to the object to be recognized in the image is filtered by a preposed interference shielding network, so that when the filtered image is processed by the object recognition network, the recognition can be prevented from being interfered by the background, and the recognition accuracy is greatly improved.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of processing an image, the method comprising:

acquiring an original image of an object to be identified;

processing the original image by using a preset interference shielding network to filter an interference background which is irrelevant to the object to be identified in the original image, and obtaining a filtered image;

and processing the filtered image by using a preset object identification network to obtain an identification result of the object to be identified.

2. The method for processing the image according to claim 1, wherein the processing of the original image by using a preset interference shielding network to filter an interference background in the original image, which is not related to the object to be recognized, to obtain a filtered image comprises:

performing convolution processing on the original image for multiple times by using the interference shielding network to obtain a characteristic image for distinguishing the object to be identified and the interference background;

and filtering the interference background in the characteristic image to obtain the filtered image.

3. The method according to claim 2, wherein performing convolution processing on the original image for a plurality of times by using the interference shielding network to obtain a feature image for distinguishing the object to be recognized from the interference background comprises:

sequentially performing convolution and activation processing with reduced characteristic scale on the original image by using the interference shielding network to obtain initial convolution characteristics;

sequentially performing convolution and activation processing with unchanged characteristic scale on the initial convolution characteristic by using the interference shielding network to obtain a first convolution characteristic; sequentially carrying out feature scale reduction convolution and activation processing on the initial convolution features by using the interference shielding network to obtain reduced convolution features; the interference shielding network is further utilized to carry out up-sampling processing on the reduced convolution characteristics to obtain second convolution characteristics with the characteristic scale being the same as that of the first convolution characteristics;

and fusing the first convolution characteristic and the second convolution characteristic by using the interference shielding network to obtain the characteristic image.

4. The method for processing the image according to claim 3, wherein the fusing the first convolution feature and the second convolution feature by using the interference shielding network to obtain the feature image comprises:

fusing the first convolution characteristic and the second convolution characteristic by using the interference shielding network to obtain a fused characteristic;

and performing feature scale invariant convolution and classification processing on the fusion features by using the interference shielding network to obtain the feature images.

5. The method according to claim 2, wherein performing convolution processing on the original image for a plurality of times by using the interference shielding network to obtain a feature image for distinguishing the object to be recognized from the interference background comprises:

sequentially performing convolution and activation processing with unchanged characteristic scale on the initial convolution characteristic by using the interference shielding network to obtain a first convolution characteristic; sequentially carrying out feature scale reduction convolution and activation processing on the initial convolution features by using the interference shielding network to obtain reduced convolution features; sequentially carrying out convolution and activation processing with unchanged characteristic scale on the reduced convolution characteristics by utilizing the interference shielding network to obtain second convolution characteristics; sequentially carrying out feature scale reduction convolution and activation processing on the reduced convolution features by utilizing the interference shielding network to obtain re-reduced convolution features; sequentially performing convolution, activation processing and up-sampling processing with unchanged characteristic scale on the convolution features which are reduced again by using the interference shielding network to obtain third convolution features with the same characteristic scale as the second convolution features;

fusing the second convolution characteristic and the third convolution characteristic by using the interference shielding network to obtain a fused characteristic; then, the interference shielding network is utilized to carry out up-sampling processing on the fused features, and fused features with the feature scale the same as that of the first convolution features are obtained; and the interference shielding network fuses the fusion feature and the first convolution feature to obtain the feature image.

6. The method according to claim 2, wherein performing convolution processing on the original image for a plurality of times by using the interference shielding network to obtain a feature image for distinguishing the object to be recognized from the interference background comprises:

and sequentially performing convolution and classification processing with unchanged characteristic scale on the initial convolution characteristics by using the interference shielding network to obtain the characteristic image.

7. The method for processing the image according to claim 2, wherein filtering the interference background in the feature image to obtain the filtered image comprises:

utilizing the interference shielding network to perform up-sampling processing on the characteristic image to obtain an amplified image with the same scale as the original image;

and adjusting the pixel value of the pixel point belonging to the interference background in the amplified image to a preset value to obtain the filtered image.

8. The method for processing the image according to claim 2, wherein filtering the interference background in the feature image to obtain the filtered image comprises:

extracting a graph of the object to be recognized from the amplified image;

and adding the graph into a preset white board image to obtain the filtered image.

9. A method of training a network, the method comprising:

acquiring a first training image set and a second training image set, wherein an object to be identified and an interference background in each training image in the first training image set are labeled; each training image in the second training image set is marked with information of an object to be identified in the training image;

training a preset interference shielding network by using the first training image set to obtain a preliminarily trained interference shielding network, wherein the interference shielding network is used for filtering an interference background in the image except for the object to be identified;

and training the preliminarily trained interference shielding network and a preset object recognition network by utilizing a second training image set so as to obtain the trained interference shielding network and the trained object recognition network, wherein the object recognition network is used for recognizing the information of the object to be recognized in the image.

10. An apparatus for processing an image, the apparatus comprising:

the image acquisition module is used for acquiring an original image of an object to be identified;

the image processing module is used for processing the original image by utilizing a preset interference shielding network so as to filter an interference background which is irrelevant to the object to be identified in the original image and obtain a filtered image; and processing the filtered image by using a preset object identification network to obtain an identification result of the object to be identified.