CN112200226B

CN112200226B - Image processing method based on reinforcement learning, image processing method and related device

Info

Publication number: CN112200226B
Application number: CN202011034575.7A
Authority: CN
Inventors: 杨幸潮; 章佳杰; 郑云飞; 于冰
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2021-11-05
Anticipated expiration: 2040-09-27
Also published as: CN112200226A

Abstract

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, and a related apparatus based on reinforcement learning. The method comprises the following steps: carrying out feature extraction on a target image to obtain a feature map of the target image; inputting the feature map of the target image into a first network for reinforcement learning to obtain a target processing mode of each pixel point in the target image; the first network is obtained by training a target network model based on a reinforcement learning method; in the target network model, weighting each image channel of the feature map extracted by at least one appointed neural network layer; and processing the target image according to the target processing mode. According to the method, the target network model trained based on the reinforcement learning method is combined with the weighting processing of each image channel, so that the effect of the target image after the first network processing is obtained based on the training of the target network model is more comprehensive and accurate.

Description

Image processing method based on reinforcement learning, image processing method and related device

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to an image processing method, and a related apparatus based on reinforcement learning.

Background

With the popularization of smart phones and the continuous upgrade of photographing equipment, users can easily obtain photos to record wonderful moments in life. However, in order to satisfy the characteristic of convenient carrying, some compromises are inevitably made on camera hardware of the current mobile phone, which causes the quality of images shot by the smart phone to be greatly different from advanced devices such as a digital single lens reflex camera, and the most common hardware limitation in the smart phone camera is the lack of a small sensor consisting of a large aperture lens and a photodiode, so that the defects of insufficient light input quantity, insufficient color, much noise, poor contrast and the like of mobile phone images compared with images shot by advanced devices such as a single lens reflex and the like are caused. Most products newly released in the current smart phone market mainly play the promotion of the mobile phone photographing function, and spend a lot of energy in the aspect of hardware promotion, for example, increasing more and more cameras with different functions to make up the defect of the mobile phone photographing hardware, so that the cost of the mobile phone needs to be promoted and the attractiveness and portability of the mobile phone can be influenced. To overcome this inherent hardware limitation, it is therefore more efficient and economical to implement the processing of the captured images in a manner that remedies the hardware deficiency from a software aspect.

The inventor finds that users often publish their photos through social media and network platforms, and the effect of the images may not be satisfactory due to the limitation of mobile phone photographing hardware, and before publishing, some image processing software is often used to correct the defects of the images (such as insufficient exposure, poor contrast, etc.), and sometimes color, brightness and tone are exaggerated to achieve an exaggerated visual effect. The editing process of typical high quality images is typically done by experienced artists with a great deal of manual labor. Some specialized interactive image processing software on the market may help the user to retouch the image. These specialized processing software require a significant amount of time and skill to learn and master. Meanwhile, the modification result is very sensitive to the parameter, and the user may have no professional skill or no concept for adjusting the parameter, so that the final result is unsatisfactory.

At present, in the image processing work of reinforcement learning adopted in the related art, a simple full convolution network is used for image processing, but the method has the problem that the image cannot be accurately processed, so that the processing effect is poor. Therefore, an image processing method that automatically enhances image effects and enables users to obtain satisfactory effects without having professional skills is a next very significant technical problem.

Disclosure of Invention

The embodiment of the disclosure provides an image processing method based on reinforcement learning, an image processing method and a related device, which are used for solving the problem that in the prior art, the processing effect is poor due to the fact that images cannot be accurately processed.

In a first aspect, an embodiment of the present disclosure provides an image processing method based on reinforcement learning, where the method includes:

carrying out feature extraction on a target image to obtain a feature map of the target image;

inputting the feature map of the target image into a first network for reinforcement learning to obtain a target processing mode of each pixel point in the target image; the first network is obtained by training a target network model based on a reinforcement learning method; in the target network model, weighting each image channel of the feature map extracted by at least one appointed neural network layer;

and processing the target image according to the target processing mode.

In some embodiments, the target network model comprises: a downsampling network for downsampling, the first network, a second network for outputting desired accumulated feedback information for image quality, and a discriminator network, the downsampling network including a plurality of downsampling layers;

the first network and the second network respectively comprise a plurality of up-sampling layers; the designated neural network layer includes the downsampling layer and/or the upsampling layer.

In some embodiments, the feature maps extracted by the down-sampling layer and the up-sampling layer comprise a plurality of sub-feature map components, and one sub-feature map component is one channel;

the weighting processing of each image channel of the feature map extracted by at least one appointed neural network layer comprises the following steps:

performing global maximum pooling and global average pooling on each sub-feature graph respectively to obtain maximum pooling features of each channel and average pooling features of each channel;

inputting the feature map obtained after the maximum pooling feature of each channel into a first full-connection network to obtain a maximum mask; inputting the feature map after obtaining the average pooling feature of each channel into a second full-connection network to obtain an average mask;

respectively carrying out weighting processing on the feature maps extracted by the specified neural network layer by using the maximum mask and the average mask to obtain a maximum mask feature map corresponding to the maximum mask and an average mask feature map corresponding to the average mask;

and fusing the maximum mask feature map and the average mask feature map, and outputting the processed feature map.

In some embodiments, the first fully connected network comprises a first class of fully connected layers, a second class of fully connected layers, a third class of fully connected layers, and an active layer;

inputting the feature map obtained after obtaining the maximum pooling feature of each channel into a first full-connection network to obtain a maximum mask, including:

inputting the feature map obtained after obtaining the maximum pooling feature of each channel into the first-class full-connection layer to obtain a first feature;

inputting the first characteristic into the second full connection layer to carry out characteristic aggregation to obtain a first characteristic after characteristic aggregation;

inputting the first feature after feature aggregation into the third full-connection layer for up-sampling to obtain a second feature;

inputting the second feature into the active layer to obtain the maximum mask.

In some embodiments, the second fully connected network comprises a first type of fully connected layer, a second type of fully connected layer, a third type of fully connected layer, and an active layer;

inputting the feature map after obtaining the average pooling feature of each channel into a second fully-connected network to obtain an average mask, including:

inputting the feature graph after the average pooling feature of each channel is obtained into the first-class full-connection layer to obtain a first feature;

and inputting the second characteristic into the activation layer to obtain the average mask.

In some embodiments, extracting a feature map using the specified neural network layer comprises:

sequentially carrying out convolution processing on the feature map input into the specified neural network layer for multiple times to obtain convolution features;

carrying out batch normalization processing on the convolution characteristics to obtain normalized characteristics;

and processing the normalized features by using an activation function to obtain activation features.

In a second aspect, an embodiment of the present disclosure provides an image processing method, where image data includes a plurality of sub-image data, and one sub-image data is a channel, the method includes:

respectively carrying out global maximum pooling and global average pooling on each piece of sub-image data to obtain maximum pooling characteristics of each channel and average pooling characteristics of each channel;

inputting the image data with the obtained maximum pooling characteristics of each channel into a first full-connection network to obtain a maximum mask; inputting the image data after the average pooling characteristics of the channels are obtained into a second full-connection network to obtain an average mask;

respectively carrying out weighting processing on the image data by utilizing the maximum mask and the average mask to obtain maximum mask image data corresponding to the maximum mask and average mask image data corresponding to the average mask;

and outputting the processed image data after the maximum mask image data and the average mask image data are subjected to fusion processing.

inputting the image data obtained after obtaining the maximum pooling characteristic of each channel into a first full-connection network to obtain a maximum mask, including:

inputting the image data after obtaining the maximum pooling characteristics of each channel into the first-class full-connection layer to obtain first characteristics;

inputting the second feature into the active layer to obtain the maximum mask.

inputting the image data after obtaining the average pooling characteristic of each channel into a second full-connection network to obtain an average mask, including:

inputting the image data after the average pooling characteristics of the channels are obtained into the first-class full-connection layer to obtain first characteristics;

In a third aspect, an embodiment of the present disclosure provides an image processing apparatus based on reinforcement learning, where the apparatus includes:

the characteristic extraction module is configured to perform characteristic extraction on a target image to obtain a characteristic diagram of the target image;

the target processing mode acquisition module is configured to execute the input of the feature map of the target image into a first network for reinforcement learning to obtain a target processing mode of each pixel point in the target image; the first network is obtained by training a target network model through a training module based on a reinforcement learning method; in the target network model, weighting each image channel of the feature map extracted by at least one appointed neural network layer;

a target image processing module configured to perform processing of the target image according to the target processing manner.

the target processing mode obtaining module is configured to, when performing weighting processing on each image channel of the feature map extracted by the at least one designated neural network layer, specifically perform:

the target processing mode obtaining module is configured to perform, when the feature map obtained after the maximum pooling feature of each channel is obtained is input into a first full-connection network, specifically perform:

inputting the second feature into the active layer to obtain the maximum mask.

the target processing mode obtaining module is configured to perform inputting the feature map after obtaining the average pooling feature of each channel into a second full-connection network to obtain an average mask, and specifically perform:

In some embodiments, the feature extraction module is configured to, when performing feature map extraction using the specified neural network layer, specifically perform:

In a fourth aspect, an embodiment of the present disclosure provides an image processing apparatus, where image data includes a plurality of sub-image data, and one sub-image data is a channel, the apparatus including:

the characteristic acquisition module is configured to execute global maximum pooling and global average pooling on each piece of sub-image data to obtain maximum pooling characteristics of each channel and average pooling characteristics of each channel;

the mask acquisition module is configured to input the image data with the obtained maximum pooling characteristics of each channel into a first full-connection network to obtain a maximum mask; inputting the image data after the average pooling characteristics of the channels are obtained into a second full-connection network to obtain an average mask;

a sub-image data acquisition module configured to perform weighting processing on the image data by using the maximum mask and the average mask, respectively, to obtain maximum mask image data corresponding to the maximum mask and average mask image data corresponding to the average mask;

and the fusion module is configured to perform fusion processing on the maximum mask image data and the average mask image data and output the processed image data.

the mask obtaining module is configured to perform inputting the image data after obtaining the maximum pooling feature of each channel into a first full-connection network, and when obtaining a maximum mask, specifically perform:

inputting the second feature into the active layer to obtain the maximum mask.

the mask obtaining module is configured to perform, when the image data obtained after the average pooling characteristic of each channel is obtained is input to a second full-connection network, specifically perform:

In a fifth aspect, another embodiment of the present disclosure also provides an electronic device, including at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the methods provided by the embodiments of the first and second aspects of the present disclosure.

In a sixth aspect, another embodiment of the present disclosure further provides a computer storage medium, where the computer storage medium stores a computer program, and the computer program is used to make a computer execute any one of the methods provided by the embodiments of the first aspect and the second aspect of the present disclosure.

In the embodiment of the disclosure, firstly, feature extraction is performed on a target image to obtain a feature map of the target image; inputting the feature map of the target image into a first network for reinforcement learning to obtain a target processing mode of each pixel point in the target image; the first network is obtained by training a target network model based on a reinforcement learning method; in the target network model, weighting each image channel of the feature map extracted by at least one appointed neural network layer; and processing the target image according to the target processing mode. Therefore, the obtained target processing mode of the target image is to perform weighting processing on each image channel of the feature map, namely, the attention mechanism based on the channel domain, and the feature information of different channels is considered, so that the feature information of important channels is enhanced, the feature information of unimportant channels is weakened, and the result of processing the target image according to the target processing mode is more accurate and comprehensive.

Drawings

Fig. 1 is a schematic structural diagram illustrating a training target network model provided by an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating an attention mechanism of a channel domain according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart illustrating an image enhancement method according to an embodiment of the present disclosure;

fig. 4 is a schematic flow chart of an image method according to an embodiment of the present disclosure;

FIG. 5 is a comparative illustration of an ablation experiment provided by embodiments of the present disclosure;

fig. 6 is a schematic structural diagram of an image device based on reinforcement learning according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an image device according to an embodiment of the disclosure;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to facilitate understanding of the technical solutions provided by the embodiments of the present disclosure, the embodiments of the present disclosure are described in further detail below with reference to the drawings of the specification. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that such descriptions are interchangeable under appropriate circumstances such that the embodiments of the disclosure can be practiced in sequences other than those illustrated or described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Automatically implementing image effect enhancement is a difficult task because it is difficult to generate enhanced images with good effect and robustness at the same time.

Currently, in order to enhance the visual effect of an image, it is necessary to add a function of local processing in a reinforcement learning framework. Therefore, the related art methods are all dedicated to realizing the local processing capability of reinforcement learning. One of these methods is implemented by first using semantic segmentation on the input image and then performing global operations on each segmented block. However, this method makes the image effect enhancement result very dependent on the semantic segmentation result, which cannot be segmented well for complex scenes, resulting in poor results. The other method in the method is realized by realizing a complete convolution network on the basis of a reinforcement learning frame so as to obtain simple image operation of each pixel, and realizing different tasks of image noise reduction, image restoration, color enhancement and the like, so that the problem of inaccurate image feature extraction result is solved.

In view of this, in order to better help the neural network to understand and extract image features and further ensure that an enhanced image with good effect and robustness can be generated, the embodiment of the present disclosure provides an image processing method based on reinforcement learning, the design concept of which is to perform feature extraction on a target image to obtain a feature map, and then input the feature map into a first network of reinforcement learning to obtain a target processing mode for each pixel point in the target image; and finally, processing the target image according to the obtained target processing mode, wherein the first network can be a policy network.

The first network adopted in the design concept is obtained by training a target network model in advance based on a reinforcement learning method, and in order to improve the rationality of an obtained target processing mode, weighting processing is carried out on each image channel of a feature map extracted by at least one appointed neural network layer in the target network model. For example, a channel domain attention mechanism is added in at least one neural network layer in a second network (such as a value network), a first network (such as a policy network) or a feature extraction network, so that the effect of important features can be strengthened from the channel dimension, and the influence of secondary features is weakened, so that the finally trained first network can give a reasonable target processing mode meeting high-quality visual requirements, and thus a higher-quality image is obtained.

In addition, in the disclosure, for the problem of poor image effect enhancement result due to dependence on semantic segmentation result and failure of good segmentation in a complex scene in the related art, the network structure of the target network model trained by the reinforcement learning-based method in the disclosure is segmented into blocks for training during training, and image effect enhancement processing is performed on a pixel level, so that dependence on semantic segmentation result is avoided when image effect enhancement is performed by the target network model adopted in the disclosure. Secondly, aiming at the defect that a large number of matched images are needed when the image processing task is carried out through a complete convolution network in the related technology, the network architecture based on down sampling and up sampling is combined with a second network and a first network, so that the extracted features and the obtained target processing mode are more accurate, and on the basis, an attention mechanism of a channel domain is adopted, so that the extracted image features are more comprehensive and reasonable.

To facilitate understanding of the image processing method based on reinforcement learning provided by the embodiments of the present disclosure, first, a structure of a target network model used in training is described and explained below.

Description of the structural training Process of the target network model

Referring to fig. 1, a schematic structural diagram of a training target network model provided in the embodiment of the present disclosure includes: a downsampling network 101 for downsampling, the first network (e.g., policy network) 102, a second network (e.g., value network) 103 for inputting desired cumulative feedback information for image quality, and a discriminator network 104, wherein:

(1) the down-sampling network 101 includes a plurality of down-sampling layers for performing feature extraction on the target image.

First, it should be noted that the target network model in fig. 1 is only a schematic diagram of one possible embodiment, and is not intended to limit the disclosure.

For example, fig. 1 shows an embodiment in which the down-sampling network includes 4 down-sampling layers, and each down-sampling layer receives an input of an image and outputs a feature map after feature extraction, and the output feature map is used as an input feature map of a next down-sampling layer. In one possible embodiment, the present disclosure trains the target network model by randomly truncating 96 × 96 the image input to the down-sampling network 101, so that the corresponding 96 × 96 image is output after the input image passes through the down-sampling network and the first network and the second network, respectively.

The channel dimensionality output by the first network is L, the channel dimensionality represents the number of different action sets of each pixel point predicted in the first network, the probability distribution of different actions executed by each pixel point is output, and therefore the target processing mode of each pixel point in the target image can be obtained.

The channel dimension output by the second network is 1, and represents the expected accumulated feedback information of each predicted pixel point. The expectation accumulation feedback information represents the expectation of the sum of the predicted future reward values of each pixel point, namely the action comes from the action space in the reinforcement learning, and the intelligent agent determines what action is to be executed currently by using the instantaneous feedback information of the last state for each state. The action is executed to achieve the maximum expectation (referred to as the expectation accumulation feedback information of each pixel point in the present disclosure) until the final algorithm converges, and the obtained strategy is the sequence data of a series of actions.

In the above embodiment, the image output from the downsampling network 101 may include 3 channels, for example, in the RGB color mode, the 3 channels include red, green, and blue channels. After the input image passes through each down-sampling layer in the down-sampling network 101, the number of channels is gradually expanded, the feature map dimension output by each down-sampling layer is correspondingly shown in fig. 1, for example, after the input image passes through the first down-sampling layer, the output feature map is 48 × 48, and the channels after down-sampling are changed into 32 channels. The feature extraction of the input image by the down-sampling network 101 increases the size of the receptive field.

Secondly, the feature maps extracted from the last downsampling layer of the downsampling network 101 are respectively input to the first network 102 and the second network 103; that is, as shown in fig. 1, the feature map extracted by the down-sampling network is input to the first network 102 for obtaining a target processing manner of the input image; while being input to the second network 103 for obtaining the desired accumulated feedback information of the output image quality.

(2) The first network 102 and the second network 103 each include a plurality of up-sampling layers therein.

The target network model adopted by the method is used for obtaining a more reasonable target processing mode so as to process a target image according to the obtained target processing mode, and a reinforcement learning model based on a network structure adopting down sampling and up sampling is used as a main frame of a first network and a second network so as to obtain the target network model trained by the method. The reinforcement learning model of the network structure adopting the down-sampling and the up-sampling is, for example, a U-Net framework, and the framework has a larger receptive field, so that the reinforcement learning model is more beneficial to extracting detailed information such as textures and the like; and, the down-sampling network is used to gradually reveal the detail feature information of the image by down-sampling the input image, and the up-sampling process is to combine the down-sampling layer information and the input information of the up-sampling network to restore the detail information, thereby gradually restoring the precision of the input image.

In practice, as shown in fig. 1, the first network 102 and the second network 103 respectively include 4 upsampling layers corresponding to the number of downsampling layers in the downsampling network 101. When each up-sampling layer is used for feature extraction, feature extraction is carried out on the basis of a feature map output by the last adjacent up-sampling layer and feature information output by the corresponding down-sampling layer; for example, as shown by C1 in fig. 1, the process of extracting feature maps is implemented for each down-sampling layer and the left half of the up-sampling layer.

In the above embodiment, the step of extracting the feature map for each downsampling layer and each upsampling layer includes:

step A1: and carrying out convolution processing on the input feature graph for multiple times in sequence to obtain convolution features.

The convolution process is performed by two convolution layers of 3 × 3, for example.

Step A2: and carrying out batch normalization processing on the convolution characteristics to obtain normalized characteristics.

Step A3: and processing the normalized features by using an activation function to obtain activation features.

(3) The discriminator network 104 is configured to identify the processed target image, obtain a quality score of the target image, and evaluate the probability that the processed target image is a real image according to the quality score.

In addition, in order to improve the rationality of the obtained target processing mode, weighting processing is carried out on each image channel of the feature map extracted from at least one designated neural network layer in the target network model; one possible implementation is to adapt the extracted features using the attention mechanism of the channel domain. Wherein the designated neural network layer comprises the downsampling layer and/or the upsampling layer.

In one possible implementation, as in the 4 down-sampling layers in fig. 1, the 4 up-sampling layers in the first network, and the 4 up-sampling layers in the second network, each sampling layer is a designated neural network layer, that is, the weighting process for the image channels of the extracted feature map is performed. One possible implementation is that the attention mechanism of the channel domain is adopted to adjust the extracted features of each sampling layer; for example, the right side of each sampling layer in fig. 1 is connected with the process of attention mechanism using channel domain as shown by the C2 layer (i.e., the attention mechanism layer of channel domain), i.e., each of the down sampling layer and the right half of the up sampling layer.

Referring to fig. 2, a schematic structural diagram of an attention mechanism of a channel domain is provided in the present disclosure, where feature maps extracted by the down-sampling layer and the up-sampling layer include a plurality of sub-feature map components, and one sub-feature map component is one channel; the method for weighting each image channel of the feature map extracted by the appointed neural network layer comprises the following steps:

step B1: and respectively carrying out global maximum pooling and global average pooling on each sub-feature graph to obtain the maximum pooling feature of each channel and the average pooling feature of each channel.

Assume that the size of the input sub-feature image is C × H × W, where C denotes the number of channels of the input sub-feature image and H × W denotes the length and width of the feature image. In implementation, performing global maximum pooling on the multi-channel feature map respectively to obtain a maximum value on each channel, thereby obtaining a maximum pooling feature of C × 1; and similarly, respectively carrying out global average pooling on the multi-channel feature map to obtain an average value of each pixel point on each channel, thereby obtaining the average pooling feature of C1.

Step B2: inputting the feature map obtained after the maximum pooling feature of each channel into a first full-connection network to obtain a maximum mask; and inputting the feature map obtained after the average pooling feature of each channel into a second full-connection network to obtain an average mask.

Wherein the first fully connected network comprises a first class fully connected layer, a second class fully connected layer, a third class fully connected layer and an active layer; the process of inputting the multi-channel feature map into a first fully-connected network to obtain a maximum mask in implementation includes the following steps:

step B21 max: inputting the feature map obtained after obtaining the maximum pooling feature of each channel into the first-class full-connection layer to obtain a first feature;

step B22 max: inputting the first characteristic into the second full connection layer to carry out characteristic aggregation to obtain a first characteristic after characteristic aggregation;

in order to reduce the computational complexity, the number of channels of the second type of fully-connected layer may be set to a value less than the number of input channels C, for example, half the number of input channels, in which case the maximum mask may also be obtained.

Step B23 max: inputting the first feature after feature aggregation into the third full-connection layer for up-sampling to obtain a second feature;

step B24 max: inputting the second feature into the active layer to obtain the maximum mask. Suppose employing AT_maxThe generation process is determined as the following formula 1:

AT_max＝σ(FC₁(MaxPoint (Q)) formula 1

Wherein Q represents an input sub-feature image; MaxPool represents that the input sub-feature image Q is subjected to maximum pooling processing to obtain maximum pooling features MaxPool (Q); FC₁Representing the maximum pooling characteristic through a first full-connection network, and sigma representing the characteristic after passing through the first full-connection network is input into an activation layer to obtain a final maximum mask AT_max。

Similarly, the second fully-connected network comprises a first fully-connected layer, a second fully-connected layer, a third fully-connected layer and an activation layer; inputting the multi-channel feature map into a second fully-connected network in implementation to obtain an average mask, including:

step B21 mean: inputting the feature graph after the average pooling feature of each channel is obtained into the first-class full-connection layer to obtain a first feature;

step B22 mean: inputting the first characteristic into the second full connection layer to carry out characteristic aggregation to obtain a first characteristic after characteristic aggregation;

step B23 mean: inputting the first feature after feature aggregation into the third full-connection layer for up-sampling to obtain a second feature;

step B24 mean: and inputting the second characteristic into the activation layer to obtain the average mask. Suppose employing AT_meanThe generation process is determined as the following formula 2:

AT_mean＝σ(FC₂(AvgPool (Q)) (formula 2)

Wherein Q represents an input sub-feature image; AvgPool represents that the average pooling processing is carried out on the input sub-feature image Q to obtain average pooling features AvgPool (Q); FC₂Representing the average pooled feature across the second fully-connected network, and sigma representing the feature input to the active layer after the second fully-connected network, to obtain the final average mask AT_mean。

Step B3: and respectively carrying out weighting processing on the feature maps extracted by the specified neural network layer by using the maximum mask and the average mask to obtain a maximum mask feature map corresponding to the maximum mask and an average mask feature map corresponding to the average mask.

During implementation, the obtained maximum mask and the input sub-feature image are subjected to fusion processing, and pixel-by-pixel multiplication is implemented to obtain a maximum mask feature image; similarly, the obtained average mask and the output characteristic image are subjected to fusion processing, and pixel-by-pixel multiplication is carried out to obtain an average mask characteristic image.

Step B4: and fusing the maximum mask feature map and the average mask feature map, and outputting the processed feature map.

In practice, if the input sub-feature image is Q, the sub-feature image is output

Can be confirmed according to the following formula 3Determining:

in another possible implementation, the extracted features may be adjusted only for the down-sampling layer or only for the attention mechanism in the up-sampling layer using the channel domain, and the present disclosure is not limited to a specific neural network layer using the attention mechanism in the channel domain.

To more intuitively understand the structure of the first network and the second network in the target network model provided by the present disclosure, one embodiment that can be implemented is illustrated by table 1 below:

TABLE 1 first and second networks

Wherein, the DoubleConv represents two convolution layers which are closely connected, thereby realizing multiple convolution processing to obtain convolution characteristics; i, O, K, P, S, etc. are parameters of the convolutional layer, where I (input) represents the input channel dimension of the feature map, O (output) represents the output channel dimension, K (Kernel) represents the size of the convolutional kernel, P (padding) is the length of the complete perimeter of the feature map, and S (stride) is the step size of the convolutional kernel. BN (BatchNorm) denotes the batch normalization layer.

Then, activating using a LeakyReLu activation function behind each BN layer, and limiting the parameters to (0, 1) is realized; attention, which refers to the channel domain Attention mechanism introduced earlier in this disclosure, is used after the LeakyReLu activation layer, and this embodiment provides an implementation that employs the channel domain Attention mechanism at all downsampling layers and at all downsampling layers.

Further, the structure of the discriminator network is described below by table 2:

TABLE 2 arbiter network

Layer number	Distinguishing device
		L1	Conv(I3，O16，K4，P1，S2)，LeakyRelu
L2	Conv(I16，O32，K4，P1，S2)，LeakyRelu
		L3	Conv(I32，O64，K4，P1，S2)，LeakyRelu
L4	Conv(I64，O128，K4，P1，S2)，LeakyRelu
		L5	Conv(I128，O128，K4，P1，S2)，LeakvRelu
L6	FC(I1152，O512)，LeakyRelu
		L7	FC(I512，O256)，LeakyRelu
L8	FC(I256，O1)

Wherein, FC represents the full link layer, add 3 layers of full link layers at the end of the arbiter, thus realize the function of shrinking the parameter continuously; such as the three fully-connected layers pointed to by C4 in arbiter 104 in fig. 1.

Image processing procedure

Based on the foregoing embodiment, after obtaining a training process of a target network model adopted by a first network, referring to fig. 3, a schematic flow diagram of an image method based on reinforcement learning provided in an embodiment of the present disclosure is shown, including:

step 301: and performing feature extraction on the target image to obtain a feature map of the target image.

Step 302: and inputting the characteristic graph of the target image into a first network for reinforcement learning to obtain a target processing mode of each pixel point in the target image.

The first network is obtained by training a target network model based on a reinforcement learning method, and the training process is the same as the embodiment described above, and is not described herein again. And in the target network model, weighting each image channel of the feature map extracted from at least one designated neural network layer. For example, the designated neural network layer is processed by using an attention mechanism of a channel domain, where the attention mechanism of the channel domain has been described in the foregoing embodiment with reference to fig. 2, and is not described herein again.

Step 303: and processing the target image according to the target processing mode.

The method comprises the steps of performing feature extraction on a target image through a target network model which is provided by the disclosure and combined with attention mechanism training of a channel domain, so that the features extracted in the disclosure focus on feature information of an important channel; in addition, when the attention mechanism of the channel domain provided by the disclosure is implemented, the input sub-feature images are respectively subjected to maximum pooling and average pooling, and then are correspondingly input to two independent full-connection networks for processing, so that a maximum mask and an average mask are obtained, the maximum mask better pays attention to detail characteristic information, the average mask pays attention to global characteristic information, and the processed feature images obtained by fusing the maximum mask feature image and the average mask feature image are more reasonable in feature information, so that a foundation is laid for image processing.

In addition, in the related art, the implementation of the attention mechanism of the channel domain mainly includes two types, one is to pool only the input feature image through global averaging, so as to compress the features of each channel, but such implementation ignores some detail texture features of the channel; the other method is to perform global average pooling and maximum pooling on input feature images, but input the obtained maximum and average values into the same neural network for feature extraction and weight assignment, which causes mutual influence between the maximum pooled feature and the average pooled feature, so that the feature extraction and the weight assignment cannot be fully considered. Then, in the application, after global maximum pooling and average pooling are respectively performed on the input sub-feature maps, the input sub-feature maps are respectively input into two different full-face layer networks, so that an implementation mode that the obtained maximum mask feature map and the obtained average mask feature map are subjected to fusion processing is realized, and the characteristics of the maximum mask feature map and the average mask feature map are well reserved.

Based on this, referring to fig. 4, a flow chart of an image processing method provided in the embodiment of the present disclosure is illustrated, where the image data includes a plurality of sub-image data, and one sub-image data is a channel, including:

step 401: and respectively carrying out global maximum pooling and global average pooling on each piece of sub-image data to obtain the maximum pooling characteristic of each channel and the average pooling characteristic of each channel.

Step 402: inputting the image data with the obtained maximum pooling characteristics of each channel into a first full-connection network to obtain a maximum mask; and inputting the image data after the average pooling characteristic of each channel is obtained into a second full-connection network to obtain an average mask.

In implementation, the first fully-connected network includes a first type fully-connected layer, a second type fully-connected layer, a third type fully-connected layer, and an active layer; inputting the image data obtained after obtaining the maximum pooling characteristic of each channel into a first full-connection network to obtain a maximum mask, including: inputting the image data after obtaining the maximum pooling characteristics of each channel into the first-class full-connection layer to obtain first characteristics; inputting the first characteristic into the second full connection layer to carry out characteristic aggregation to obtain a first characteristic after characteristic aggregation; inputting the first feature after feature aggregation into the third full-connection layer for up-sampling to obtain a second feature; inputting the second feature into the active layer to obtain the maximum mask.

In implementation, the second fully-connected network includes a first-type fully-connected layer, a second-type fully-connected layer, a third-type fully-connected layer, and an active layer; inputting the image data after obtaining the average pooling characteristic of each channel into a second full-connection network to obtain an average mask, including: inputting the image data after the average pooling characteristics of the channels are obtained into the first-class full-connection layer to obtain first characteristics; inputting the first characteristic into the second full connection layer to carry out characteristic aggregation to obtain a first characteristic after characteristic aggregation; inputting the first feature after feature aggregation into the third full-connection layer for up-sampling to obtain a second feature; and inputting the second characteristic into the activation layer to obtain the average mask.

Step 403: and respectively carrying out weighting processing on the image data by utilizing the maximum mask and the average mask to obtain maximum mask image data corresponding to the maximum mask and average mask image data corresponding to the average mask.

Step 404: and outputting the processed image data after the maximum mask image data and the average mask image data are subjected to fusion processing.

According to the image processing method provided by the disclosure, the maximum pooling characteristic and the average pooling characteristic which are respectively obtained by inputting image data are respectively input into the first full-connection network and the second full-connection network, so that the obtained maximum mask pays attention to the characteristic information such as details and the like, and the average mask pays attention to the global characteristic information, and then when fusion processing is performed, because the characteristic extraction and the weight distribution based on the maximum pooling characteristic and the average pooling characteristic are realized, the characteristic extraction and the weight distribution between the maximum mask characteristic diagram and the average mask characteristic diagram in the fusion processing process can not generate excessive influence, and therefore more comprehensive, reasonable and accurate characteristic information and weight distribution are obtained.

To facilitate understanding of the higher quality images obtained from the target network model trained in the present disclosure, this is described below in connection with experimental result data.

Ablation experiment

In order to verify the necessity of the attention mechanism of the combined channel domain implemented by the present disclosure, an ablation experiment was designed to prove the effectiveness of the training method of the target network model of the present disclosure in the reinforcement learning method. The core idea of the ablation experiment is to delete the attention mechanism part of the channel domain, and carry out the experiment to draw conclusions when the attention mechanism of the channel domain is adopted and not adopted, similar to a control variable method.

Ablation experiments to verify the attention mechanism of the channel domain measure the effect of the resulting image by moving out of the attention mechanism of the channel domain. The ablation experiment for verifying the attention mechanism of the channel domain only needs 2450 paired image training because the discriminator network is removed. The test was performed using 100 test data sets, where the effect display is shown in fig. 5, the first column is the original input image, the second column is the image result after removing the attention mechanism of the channel domain, and the third column corresponds to the image result of the method of the present disclosure.

As can be seen from fig. 5, the result of removing the attention mechanism of the channel domain may result in obvious boundary artifacts, which are caused by the inaccuracy of feature extraction, and the locations of the artifacts in the figure are marked by dashed boxes, such as the tree edge of (a), (b) the shadow-in-wine edge, and (c) the cloud edge. The image result obtained by the method has no obvious artifact, and the attention mechanism of the channel domain is proved to help the target network model trained based on the reinforcement learning method to realize the extraction of accurate features, so that the target image is processed more reasonably by a target processing mode obtained by the first network.

Objective evaluation results and analysis

Although there is much research work on image effect enhancement, a comprehensive and objective evaluation system is still not established, i.e., a set of standard evaluation systems cannot be used for comprehensively measuring the image quality under all conditions. At present, most methods use human subjective visual evaluation, but evaluation results are often influenced by various subjective factors such as personal aesthetics and personal preference, and inconsistent evaluation may occur. In order to ensure the objectivity of the result, the method combines objective evaluation and subjectivity, and verifies the effectiveness of the method more comprehensively.

Currently, the main mainstream objective evaluation indexes are 3 types: mean square loss MSE, Peak Signal To Noise Ratio (PSNR), and Structural Similarity Index (SSIM). The evaluation indexes are suitable for the paired images, and the measuring method is used for determining the difference between an output image and a target image and is commonly used in the fields of denoising, image restoration and the like.

MSE is widely used in deep learning to measure the difference between features, and smaller MSE of a target image indicates closer to the target. MSE operation is simple and effective, is a common measurement index, and is set as s for a result image and s for a corresponding target image

The size is m × n, and MSE is expressed as shown in the following equation 4.

PSNR is also an objective evaluation index widely used at present, and is defined as a ratio of a maximum signal of an image to background noise, and a larger ratio means that image quality is closer to a target image, and since MSE is included in PSNR, PSNR is generally used as an objective index, as shown in formula 5 below.

The SSIM index measures the similarity of the brightness, contrast and structural profile of an image and a target, and the higher the SSIM index is, the more similar the SSIM index is to the target image, as shown in the following formula 6.

Where x, y represent the resulting image and the target image, respectively, mu_x，μ_yThe mean value of the pixel values representing the result image and the target image,

representing the variance, σ, of the pixel values of the resulting image and the target image_xyRepresenting the covariance of both, c₁，c₂Is constant and avoids value fluctuations. In order to accelerate the calculation, each SSIM is calculated for a fixed-size window, then a window is slid to obtain a plurality of SSIMs, and finally the SSIMs of the whole image are obtained by averaging.

The present disclosure uses PSNR and SSIM indices to measure the difference between reinforcement learning modified images and human expert modified images. Table 3 presents the PSNR and SSIM results of the ablation experiments. The PSNR and SSIM indexes of the final version are better than the indexes for removing the global feedback information, and the necessity of combining the global feedback information with the local feedback information is proved from the objective indexes.

TABLE 3 Objective indices PSNR and SSIM comparison

Based on the same inventive concept, the embodiment of the present disclosure further provides an image processing apparatus based on reinforcement learning, and the principle and the beneficial effects of the apparatus are similar to those described in the above method embodiment, and are not repeated herein.

As shown in fig. 6, which is a schematic structural diagram of the apparatus, the apparatus includes: a feature extraction module 601, a target processing mode acquisition module 602, and a target image processing module 603;

a feature extraction module 601 configured to perform feature extraction on a target image to obtain a feature map of the target image;

a target processing mode obtaining module 602, configured to execute inputting the feature map of the target image into a first network for reinforcement learning, so as to obtain a target processing mode of each pixel point in the target image; the first network is obtained by training a target network model through a training module based on a reinforcement learning method; in the target network model, weighting each image channel of the feature map extracted by at least one appointed neural network layer;

a target image processing module 603 configured to perform processing of the target image according to the target processing manner.

the target processing mode obtaining module 602 is configured to, when performing weighting processing on each image channel of the feature map extracted by the at least one designated neural network layer, specifically perform:

the target processing manner obtaining module 602 is configured to perform, when the feature map obtained after obtaining the maximum pooled feature of each channel is input into the first full-connection network and a maximum mask is obtained, specifically perform:

inputting the second feature into the active layer to obtain the maximum mask.

the target processing mode obtaining module 602 is configured to perform inputting the feature map after obtaining the average pooled feature of each channel into a second full-connection network to obtain an average mask, and specifically perform:

In some embodiments, the feature extraction module 601 is configured to, when extracting the feature map by using the specified neural network layer, specifically perform:

Based on the same inventive concept, the embodiments of the present disclosure further provide an image processing apparatus, and the principle and the beneficial effects of the apparatus are similar to those described in the above method embodiments, and are not repeated herein.

As shown in fig. 7, a schematic structural diagram of the apparatus is shown, in which the image data includes a plurality of sub-image data, and one sub-image data is a channel, the apparatus includes: a feature acquisition module 701, a mask acquisition module 702, a sub-image data acquisition module 703 and a fusion module 704;

a feature obtaining module 701, configured to perform global maximum pooling and global average pooling on each piece of sub-image data, to obtain maximum pooling features of each channel and average pooling features of each channel;

a mask obtaining module 702, configured to input the image data with the obtained maximum pooling characteristics of each channel into a first full-connection network to obtain a maximum mask; inputting the image data after the average pooling characteristics of the channels are obtained into a second full-connection network to obtain an average mask;

a sub-image data obtaining module 703 configured to perform weighting processing on the image data by using the maximum mask and the average mask, respectively, to obtain maximum mask image data corresponding to the maximum mask and average mask image data corresponding to the average mask;

a fusion module 704 configured to perform fusion processing on the maximum mask image data and the average mask image data, and output processed image data.

the mask obtaining module 702 is configured to perform, when the image data obtained with the maximum pooling characteristic of each channel is input to a first full-connection network, specifically perform:

inputting the second feature into the active layer to obtain the maximum mask.

the mask obtaining module 702 is configured to perform, when the image data obtained after obtaining the average pooling characteristic of each channel is input to a second full-connection network, specifically perform:

For the image processing apparatus based on reinforcement learning and the implementation and beneficial effects of the operations in the image processing apparatus, reference is made to the description of the foregoing method, and details are not repeated here.

Having described a reinforcement learning-based image processing and image processing method and apparatus according to an exemplary embodiment of the present disclosure, an electronic device according to another exemplary embodiment of the present disclosure is described next.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

In some possible implementations, an electronic device in accordance with the present disclosure may include at least one processor, and at least one memory. Wherein the memory stores program code which, when executed by the processor, causes the processor to perform the steps of the reinforcement learning based image processing and image processing methods according to various exemplary embodiments of the present disclosure described above in this specification. For example, the processor may perform the steps shown in fig. 3 or fig. 4.

The electronic device 130 according to this embodiment of the present disclosure is described below with reference to fig. 8. The electronic device 130 shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, the electronic device 130 is represented in the form of a general electronic device. The components of the electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 that connects the various system components (including the memory 132 and the processor 131).

Bus 133 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.

The memory 132 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.

Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 130, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur via input/output (I/O) interfaces 135. Also, the electronic device 130 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 136. As shown, network adapter 136 communicates with other modules for electronic device 130 over bus 133. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 130, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

In some possible embodiments, various aspects of the reinforcement learning based image processing and image processing method provided by the present disclosure may also be implemented in the form of a program product including program code for causing a computer device to perform the steps of the reinforcement learning based image processing and image processing method according to various exemplary embodiments of the present disclosure described above in this specification when the program product is run on the computer device, for example, the computer device may perform the steps as shown in fig. 3 or fig. 4.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product for reinforcement learning-based image processing and image processing of embodiments of the present disclosure may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device and partly on a remote electronic device, or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic devices may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., through the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.

Further, while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present disclosure have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the disclosure.

It will be apparent to those skilled in the art that various changes and modifications can be made in the present disclosure without departing from the spirit and scope of the disclosure. Thus, if such modifications and variations of the present disclosure fall within the scope of the claims of the present disclosure and their equivalents, the present disclosure is intended to include such modifications and variations as well.

Claims

1. An image processing method based on reinforcement learning, characterized in that the method comprises:

inputting the feature map of the target image into a first network for reinforcement learning to obtain a target processing mode of each pixel point in the target image; processing the target image according to the target processing mode;

the first network is obtained by training a target network model based on a reinforcement learning method; the target network model includes: a downsampling network for downsampling, the first network, a second network for outputting desired accumulated feedback information for image quality, and a discriminator network, the downsampling network including a plurality of downsampling layers; the first network and the second network respectively comprise a plurality of up-sampling layers;

specifying a neural network layer to include the downsampling layer and/or the upsampling layer;

in the target network model, the feature map extracted by the appointed neural network layer comprises a plurality of sub-feature map components, and one sub-feature map component is a channel;

the designated neural network layer is used for performing weighting processing on each image channel, and specifically includes:

respectively carrying out global maximum pooling and global average pooling on each sub-feature map component to obtain maximum pooling features of each channel and average pooling features of each channel;

inputting the maximum pooling characteristics of each channel into a first full-connection network to obtain a maximum mask; inputting the average pooling characteristics of the channels into a second fully-connected network to obtain an average mask;

wherein the first fully connected network comprises a first class fully connected layer, a second class fully connected layer, a third class fully connected layer and an active layer; the inputting the maximum pooling characteristic of each channel into a first full-connection network to obtain a maximum mask specifically includes: inputting the maximum pooling characteristic of each channel into the first-class full-connection layer to obtain a first characteristic; inputting the first characteristic into the second full connection layer to carry out characteristic aggregation to obtain a first characteristic after characteristic aggregation; inputting the first feature after feature aggregation into the third full-connection layer for up-sampling to obtain a second feature; inputting the second feature into the active layer to obtain the maximum mask;

and fusing the maximum mask feature map and the average mask feature map, and outputting the processed feature map to a next neural network layer for processing.

2. The method of claim 1, wherein the second fully connected network comprises a first type of fully connected layer, a second type of fully connected layer, a third type of fully connected layer, and an active layer;

inputting the average pooling characteristic of each channel into a second fully-connected network to obtain an average mask, including:

inputting the average pooling characteristic of each channel into the first type full connection layer to obtain a first characteristic;

3. The method of claim 1 or 2, wherein extracting a feature map using the specified neural network layer comprises:

4. An image processing method, wherein image data includes a plurality of sub-image data, and one sub-image data is a channel, the method comprising:

inputting the image data of the maximum pooling characteristic of each channel into a first full-connection network to obtain a maximum mask; inputting the image data of the average pooling characteristic of each channel into a second full-connection network to obtain an average mask; wherein the first fully connected network comprises a first class fully connected layer, a second class fully connected layer, a third class fully connected layer and an active layer; inputting the image data of the maximum pooling characteristic of each channel into a first full-connection network to obtain a maximum mask, specifically comprising: inputting the maximum pooling characteristic of each channel into the first-class full-connection layer to obtain a first characteristic; inputting the first characteristic into the second full connection layer to carry out characteristic aggregation to obtain a first characteristic after characteristic aggregation; inputting the first feature after feature aggregation into the third full-connection layer for up-sampling to obtain a second feature; inputting the second feature into the active layer to obtain the maximum mask;

5. The method of claim 4, wherein the second fully connected network comprises a first type of fully connected layer, a second type of fully connected layer, a third type of fully connected layer, and an active layer;

inputting the image data of the average pooling characteristic of each channel into a second fully-connected network to obtain an average mask, including:

inputting the image data of the average pooling feature of each channel into the first-type full-connection layer to obtain a first feature;

6. An apparatus for image processing based on reinforcement learning, the apparatus comprising:

the target processing mode acquisition module is configured to execute the input of the feature map of the target image into a first network for reinforcement learning to obtain a target processing mode of each pixel point in the target image;

a target image processing module configured to perform processing of the target image according to the target processing manner;

7. The apparatus of claim 6, wherein the second fully connected network comprises a first type of fully connected layer, a second type of fully connected layer, a third type of fully connected layer, and an active layer;

the target processing mode obtaining module is configured to perform input of the average pooling characteristic of each channel into a second fully-connected network to obtain an average mask, and specifically perform:

8. The apparatus according to claim 6 or 7, wherein the feature extraction module is configured to, when performing extracting the feature map using the specified neural network layer, specifically perform:

9. An image processing apparatus, wherein image data includes a plurality of sub-image data, and a sub-image data is a channel, the apparatus comprising:

a mask obtaining module configured to input image data of the maximum pooling feature of each channel into a first fully-connected network to obtain a maximum mask; inputting the image data of the average pooling characteristic of each channel into a second full-connection network to obtain an average mask; wherein the first fully connected network comprises a first class fully connected layer, a second class fully connected layer, a third class fully connected layer and an active layer; the mask obtaining module is specifically configured to perform: inputting the maximum pooling characteristic of each channel into the first-class full-connection layer to obtain a first characteristic; inputting the first characteristic into the second full connection layer to carry out characteristic aggregation to obtain a first characteristic after characteristic aggregation; inputting the first feature after feature aggregation into the third full-connection layer for up-sampling to obtain a second feature; inputting the second feature into the active layer to obtain the maximum mask;

10. The apparatus of claim 9, wherein the second fully connected network comprises a first type of fully connected layer, a second type of fully connected layer, a third type of fully connected layer, and an active layer;

the mask obtaining module is configured to perform, when the image data of the average pooling characteristic of each channel is input into a second full-connection network and an average mask is obtained, specifically perform:

11. An electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A computer storage medium, characterized in that the computer storage medium stores a computer program for causing a computer to perform the method according to any one of claims 1-5.