CN110490813B

CN110490813B - Feature map enhancement method, device, equipment and medium for convolutional neural network

Info

Publication number: CN110490813B
Application number: CN201910605387.6A
Authority: CN
Inventors: 贾琳; 赵磊
Original assignee: Terminus Beijing Technology Co Ltd
Current assignee: Terminus Beijing Technology Co Ltd
Priority date: 2019-07-05
Filing date: 2019-07-05
Publication date: 2021-12-17
Anticipated expiration: 2039-07-05
Also published as: CN110490813A

Abstract

The application discloses a method, a device, equipment and a medium for enhancing a feature map of a convolutional neural network, which are used for carrying out convolution operation on an input original image to obtain a corresponding multilayer feature map, grouping the feature map according to channel dimensions to obtain a plurality of sub-feature maps, carrying out global average pooling and global maximum pooling parallel processing on each sub-feature map by utilizing an embedded airspace grouping enhancement SGE module to obtain two corresponding channel dimension vectors, obtaining an attention enhancement factor of each channel in the corresponding sub-feature map according to the two corresponding channel dimension vectors, obtaining a corresponding enhancement feature map according to the attention enhancement factor and the corresponding sub-feature map, obtaining an enhancement feature map corresponding to the feature map of a certain layer according to all the enhancement feature maps, thereby better expressing semantic information of the importance degree between sub-feature map channels, the task performances of image classification, segmentation, detection and the like of the convolutional neural network are improved.

Description

Feature map enhancement method, device, equipment and medium for convolutional neural network

Technical Field

The present application relates to the field of computer vision technologies, and in particular, to a method, an apparatus, a device, and a medium for enhancing a feature map of a convolutional neural network.

Background

With the rise of deep learning, CNN (Convolutional Neural Network) is increasingly developed and applied in the field of computer vision as one of deep learning techniques, and researchers propose many convolution operations, such as transposed convolution, dilated convolution, grouped convolution, deep separation convolution, point-by-point convolution, deformable convolution, and the like. The grouping convolution has great advantages in reducing the amount of computation and parameter, preventing overfitting, and the like, and is consistent with the grouping ideas adopted by the artificial design features in the early computer vision field, such as HOG (Histogram of Oriented Gradient), SIFT (Scale-invariant feature transform), LBP (Local Binary Pattern), and the like, so that many classical networks, AlexNet, resext, MobileNet, ShuffleNet, capsulet, and the like all use the grouping ideas, and all group the feature maps along the channel dimension, and then perform convolution or regularization processing on each group of sub-feature maps, so that the semantic feature information of a specific region can be better expressed, and excellent performance improvement is achieved in the computer vision field.

At present, most CNN network structures improve the feature expression capability of a model by introducing an attention mechanism, and become very popular in the fields of computer vision and the like, and various neural network structures introduce a channel dimension or space dimension attention mechanism to enhance useful channel information and compress useless channel information, and can also fuse multi-scale features or global context information to further improve the enhancement capability of a specific region of a feature map, so that the neural network has a more interpretable mechanism. It can be seen that adding the grouped sub-feature maps into the attention mechanism can further enhance the ability to learn and express the semantic feature information of a specific region, and further compress noise and interference.

However, semantic feature information extracted by an SGE (Spatial Group-wise enhancement) module embedded in a conventional CNN network structure is not sufficiently expressed, which results in performance degradation of tasks such as image classification, segmentation, and detection of the CNN network.

Disclosure of Invention

The application aims to provide a feature map enhancement method, a feature map enhancement device, feature map enhancement equipment and feature map enhancement media for improving task performances of image classification, segmentation, detection and the like of a convolutional neural network.

In a first aspect, an embodiment of the present application provides a feature map enhancement method for a convolutional neural network, including:

performing convolution operation on an input original image to obtain a corresponding multilayer characteristic diagram;

grouping feature graphs of a certain layer according to channel dimensions to obtain a plurality of sub-feature graphs;

aiming at each sub-feature graph, performing global average pooling and global maximum pooling parallel processing by using an embedded airspace grouping enhancement SGE module to obtain two corresponding channel dimension vectors;

obtaining an attention enhancement factor of each channel in the corresponding sub-feature graph according to the corresponding two channel dimension vectors;

obtaining a corresponding enhancer characteristic map according to the attention enhancement factor and the corresponding sub characteristic map;

and obtaining an enhanced characteristic diagram corresponding to the characteristic diagram of the certain layer according to all the enhancer characteristic diagrams.

In a possible implementation manner, in the foregoing method provided in this embodiment of the present application, the obtaining, according to the two corresponding channel dimension vectors, an attention enhancement factor of each channel in a corresponding sub-feature map includes:

reducing the dimension of the corresponding two channel dimension vectors by using 1 x 1 convolution;

and activating and adding the two channel dimension vectors after dimension reduction by using a ReLU activation function to obtain the attention enhancement factor of each channel in the corresponding sub-feature map.

In one possible implementation manner, in the foregoing method provided in this embodiment of the present application, the obtaining a corresponding enhancer feature map according to the attention enhancement factor and the corresponding sub-feature map includes:

using 1 × 1 convolution to raise the attention enhancement factor to the number of channels corresponding to the sub-feature map;

normalizing the attention enhancement factor using a SoftMax function;

multiplying the normalized attention enhancement factor by the corresponding sub-feature map to obtain an enhanced first sub-feature map;

regularizing the first sub-feature graph to obtain a second sub-feature graph;

activating the second sub-feature graph by using a Sigmoid activation function to obtain an enhanced third sub-feature graph;

and taking the third sub-feature map as an enhancer feature map.

In a possible implementation manner, in the foregoing method provided in this embodiment of the present application, the activating the second sub-feature map by using a Sigmoid activation function to obtain an enhanced third sub-feature map includes:

obtaining an importance coefficient of the second sub-feature map channel by using a Sigmoid activation function;

and scaling the second sub-feature map by using the importance coefficient to recalibrate the importance of the spatial domain feature on the channel of the second sub-feature map to obtain an enhanced third sub-feature map.

In a second aspect, an embodiment of the present application provides a feature map enhancement apparatus for a convolutional neural network, including:

the convolution module is used for carrying out convolution operation on the input original image to obtain a corresponding multilayer characteristic diagram;

the grouping module is used for grouping the characteristic diagrams of a certain layer according to the channel dimension to obtain a plurality of sub-characteristic diagrams;

the enhancement module is used for carrying out global average pooling and global maximum pooling parallel processing by utilizing the embedded airspace grouping enhancement SGE module aiming at each sub-feature map to obtain two corresponding channel dimension vectors; obtaining an attention enhancement factor of each channel in the corresponding sub-feature graph according to the corresponding two channel dimension vectors; obtaining a corresponding enhancer characteristic map according to the attention enhancement factor and the corresponding sub characteristic map;

and the output module is used for obtaining an enhanced feature map corresponding to the feature map of the certain layer according to all the enhanced feature maps.

In a possible implementation manner, in the apparatus provided in this embodiment of the present application, the enhancing module is specifically configured to:

normalizing the attention enhancement factor using a SoftMax function;

regularizing the first sub-feature graph to obtain a second sub-feature graph;

and taking the third sub-feature map as an enhancer feature map.

In a third aspect, an embodiment of the present application provides an electronic device, including: a memory and a processor;

the memory for storing a computer program;

wherein the processor executes the computer program in the memory to implement the method described in the first aspect and the various embodiments of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program is used for implementing the method described in the first aspect and the implementation manners of the first aspect when executed by a processor.

Compared with the prior art, the feature map enhancement method, the device, the equipment and the medium of the convolutional neural network carry out convolution operation on an input original image to obtain a corresponding multilayer feature map, a certain layer of feature map is grouped according to channel dimensions to obtain a plurality of sub-feature maps, an embedded airspace grouping enhancement SGE module is utilized to carry out global average pooling and global maximum pooling parallel processing aiming at each sub-feature map to obtain two corresponding channel dimension vectors, an attention enhancement factor corresponding to each channel in the sub-feature maps is obtained according to the two corresponding channel dimension vectors, a corresponding enhancement feature map is obtained according to the attention enhancement factor and the corresponding sub-feature map, an enhancement feature map corresponding to the certain layer of feature map is obtained according to all enhancement feature maps, and therefore the attention enhancement factor of the channel dimensions is extracted through the global average pooling and the global maximum pooling operation, semantic information of importance degree among sub-feature graph channels can be better expressed, and meanwhile, a space domain grouping enhancement module structure is redesigned, so that the calculation of attention enhancement factors of the sub-feature graph channel dimensions is more effective, and the task performances of image classification, segmentation, detection and the like of the convolutional neural network are further improved.

Drawings

Fig. 1 is a schematic flowchart of a feature map enhancement method for a convolutional neural network according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating a flow of an algorithm for enhancing a sub-feature map according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a feature map enhancing apparatus of a convolutional neural network according to a second embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.

Detailed Description

The following detailed description of embodiments of the present application is provided in conjunction with the accompanying drawings, but it should be understood that the scope of the present application is not limited to the specific embodiments.

Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.

Fig. 1 is a schematic flowchart of a feature map enhancement method for a convolutional neural network according to an embodiment of the present disclosure. In practical applications, the execution subject of this embodiment may be a feature map enhancing apparatus of a convolutional neural network, where the feature map enhancing apparatus of the convolutional neural network may be implemented by a virtual apparatus, such as a software code, or by an entity apparatus written with a relevant execution code, such as a usb disk, or by an entity apparatus integrated with a relevant execution code, such as a chip, a computer, and the like.

As shown in fig. 1, the method includes the following steps S101 to S106:

s101, performing convolution operation on the input original image to obtain a corresponding multilayer characteristic diagram.

In this embodiment, a pre-constructed convolutional neural network performs a multilayer convolution operation on an input original image, so as to obtain a corresponding multilayer feature map. It is understood that one of the layers of the feature map includes a certain number of channels.

And S102, grouping the feature graphs of a certain layer according to channel dimensions to obtain a plurality of sub-feature graphs.

In this embodiment, in the convolutional neural network learning process, the packet convolution may gradually capture a special semantic response, so that the response of a position of interest is larger, and other positions are not activated or have no response, and meanwhile, the amount of computation and parameter may also be reduced by using the packet convolution, so that the feature maps may be first grouped to better enhance the semantic feature information learning of a specific region, specifically, a certain number of channels in the feature maps are grouped to obtain a plurality of sub-feature maps with the same number as the grouped channels.

The output characteristic diagram after multilayer convolution is assumed to be

Where C is the number of channels in the feature map, H and W respectively represent the length and width of the feature map, and the feature map is first divided into G groups along the channel dimension, and then the vector of each spatial domain position of the obtained sub-feature map is denoted as X ═ X_1,...,H×WWherein each element is

S103, aiming at each sub-feature graph, performing global average pooling and global maximum pooling parallel processing by using an embedded airspace grouping enhancement SGE module to obtain two corresponding channel dimension vectors.

And S104, obtaining the attention enhancement factor of each channel in the corresponding sub-feature graph according to the corresponding two channel dimension vectors.

In this embodiment, step S104 may be implemented as: reducing the dimension of the corresponding two channel dimension vectors by using 1 x 1 convolution; and activating and adding the two channel dimension vectors after dimension reduction by using a ReLU activation function to obtain the attention enhancement factor of each channel in the corresponding sub-feature map.

In practical application, some CNN network structures improve the feature expression capability of the model by introducing an attention mechanism, which not only tells the network model which important features to pay attention to, but also can enhance the expression capability of a specific region. But the way of cascading the attention enhancement modules in channel dimension and spatial dimension also increases the amount of computation and parameters of the network model.

Fig. 2 is a schematic flowchart of an algorithm for enhancing a sub-feature map according to an embodiment of the present disclosure. As shown in fig. 2, the left X column of the diagram represents that the feature maps are grouped into 3 sub-feature maps, and each sub-feature map is enhanced through the algorithm flow below the diagram to obtain the corresponding enhanced feature map in the right V column of the diagram.

The above algorithm flow is described in detail below. Considering the problems of calculation amount and model size, the method only uses a channel dimension attention mechanism, utilizes global average pooling to generate response to each spatial domain position of the sub-feature map, and combines global maximum pooling to only have gradient feedback to the position with maximum feature response during reverse propagation. And simultaneously redesigning a space domain grouping enhancement module structure, performing parallel processing on the sub-feature graphs by using global statistical feature information, and extracting attention enhancement factors of channel dimensions by using global average pooling and global maximum pooling respectively for expressing semantic information of the importance degree between the sub-feature graph channels.

The extraction process of the global statistical characteristic information is as follows:

wherein, a and b represent the channel dimension vector after the parallel processing of the global average pooling and the global maximum pooling, respectively, and max (·) represents the maximum operation of taking response to all positions of the channel dimension vector, so as to obtain the maximum activation information of each channel. Doing so means compressing the channel dimension of the grouped sub-feature maps from C H W to C H W

Meanwhile, each value in the channel dimension vector represents the importance of each grouping sub-feature graph among channels by using the global statistical feature information.

It can be understood that in this embodiment, the global maximum pooling information is effectively incorporated into the attention enhancement factor calculation, and the semantic feature information expression capability of the spatial domain grouping enhancement module is enhanced.

In order to model the interdependency among the sub-feature diagram channels, the dimension of a channel dimension vector is reduced by using 1 × 1 convolution with a ReLU activation function, the nonlinear interaction capability of information among the channels is increased, and the calculation amount is reduced at the same time, wherein the expression is as follows:

e＝ReLU(W₁a) (3)

f＝ReLU(W₂b) (4)

wherein, W₁And W₂Weight matrices, denoted 1 × 1 convolution dimensionality reduction operation, respectively

And

and the channel dimensions satisfy the relationship:

and adding the two channel dimension vectors to obtain the attention enhancement factor corresponding to each channel of the sub-feature map, wherein the expression is as follows:

and S105, obtaining a corresponding enhancer characteristic map according to the attention enhancement factor and the corresponding sub characteristic map.

In this embodiment, the step S105 may be implemented as: using 1 × 1 convolution to raise the attention enhancement factor to the number of channels corresponding to the sub-feature map; normalizing the attention enhancement factor using a SoftMax function; multiplying the normalized attention enhancement factor by the corresponding sub-feature map to obtain an enhanced first sub-feature map; regularizing the first sub-feature graph to obtain a second sub-feature graph; activating the second sub-feature graph by using a Sigmoid activation function to obtain an enhanced third sub-feature graph; and taking the third sub-feature map as an enhancer feature map.

To add more non-linearity to better fit complex relationships between channels, a 1 × 1 convolution upscaling operation is first used to bring the dimension of the attention-enhancing factor to the number of channels of the sub-feature map so that the number of channels of the sub-feature map matches the number of channels of the sub-feature map during subsequent weighted averaging processing. The expression of the attention enhancement factor after dimensionality raising and normalization is as follows:

wherein, W₃Weight matrix representing 1 × 1 convolution upscaled operation, denoted

For each spatial position in the sub-feature map, x is paired with the normalized attention-enhancing factor u as described above_iAnd weighted averaging to obtain a sub-feature map enhanced by the attention mechanism channel, wherein the expression is as follows:

y_i＝u·x_i (7)

further, the second sub-feature map is activated by using a Sigmoid activation function to obtain an enhanced third sub-feature map, which can be specifically realized as follows: obtaining an importance coefficient of the second sub-feature map channel by using a Sigmoid activation function; and scaling the second sub-feature map by using the importance coefficient to recalibrate the importance of the spatial domain feature on the channel of the second sub-feature map to obtain an enhanced third sub-feature map.

In this embodiment, in order to eliminate the interference of the amplitude difference between the samples to the result, the sub-feature map enhanced by the attention mechanism channel is processed by using regularization. For each spatial position of the sub-feature map, the expression of the regularization operation is as follows:

wherein z is_iRepresenting a sub-feature graph, μ, after regularization of the channel dimension_cA mean value representing the sub-feature map,

the variance of the sub-feature map is shown.

And then, obtaining an importance coefficient of the regularized sub-feature graph channel dimension by using a Sigmoid activation function, scaling the regularized sub-feature graph by using the importance coefficient, and re-calibrating the spatial domain feature importance of the sub-feature graph channel dimension. The expression describing this process is as follows:

v_i＝x_i·Sigmoid(αz_i+β) (11)

where α and β represent parameters of the scale operation and the panning operation, respectively, on the regularized sub-feature map, both of which have the same fixed value for each packet.

It can be understood that, in this embodiment, the spatial domain grouping enhancement module structure is redesigned, and the 1 × 1 convolution fusion pre-dimensionality reduction fusion post-dimensionality enhancement operation is adopted, so that not only is the amount of computation reduced, but also the information interaction capability of the sub-feature graph channel dimension is enhanced, and the probability of the channel dimension spatial domain feature importance degree obtained by using the SoftMax activation function is used to represent the attention enhancement factor.

And S106, obtaining an enhanced characteristic diagram corresponding to the characteristic layer of the certain layer according to all the enhanced characteristic diagrams.

According to the feature map enhancement method of the convolutional neural network, attention enhancement factors of channel dimensions are extracted through global average pooling and global maximum pooling, semantic information of importance degrees among sub-feature map channels can be better expressed, and meanwhile, a space domain grouping enhancement module structure is redesigned, so that the attention enhancement factors of the sub-feature map channel dimensions are calculated more effectively, and task performances of image classification, segmentation, detection and the like of the convolutional neural network are further improved.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Fig. 3 is a schematic structural diagram of a feature map enhancing apparatus of a convolutional neural network according to a second embodiment of the present application, and as shown in fig. 3, the apparatus may include:

a convolution module 310, configured to perform convolution operation on an input original image to obtain a corresponding multi-layer feature map;

the grouping module 320 is configured to group feature maps of a certain layer according to channel dimensions to obtain a plurality of sub-feature maps;

the enhancement module 330 is configured to perform global average pooling and global maximum pooling parallel processing by using the embedded airspace grouping enhancement SGE module for each sub-feature map to obtain two corresponding channel dimension vectors; obtaining an attention enhancement factor of each channel in the corresponding sub-feature graph according to the corresponding two channel dimension vectors; obtaining a corresponding enhancer characteristic map according to the attention enhancement factor and the corresponding sub characteristic map;

and the output module 340 is configured to obtain an enhanced feature map corresponding to the certain layer of feature map according to all the enhanced feature maps.

According to the feature map enhancement device of the convolutional neural network, attention enhancement factors of channel dimensions are extracted through global average pooling and global maximum pooling, semantic information of importance degrees among sub-feature map channels can be better expressed, and meanwhile, a space domain grouping enhancement module structure is redesigned, so that the attention enhancement factors of the sub-feature map channel dimensions are calculated more effectively, and task performances of image classification, segmentation, detection and the like of the convolutional neural network are further improved.

In some embodiments, the enhancement module 330 is specifically configured to:

normalizing the attention enhancement factor using a SoftMax function;

regularizing the first sub-feature graph to obtain a second sub-feature graph;

and taking the third sub-feature map as an enhancer feature map.

In some embodiments, the enhancement module 330 is specifically configured to:

Fig. 4 is a schematic structural diagram of an electronic device according to a third embodiment of the present application, and as shown in fig. 4, the electronic device includes: a memory 401 and a processor 402;

a memory 401 for storing a computer program;

wherein the processor 402 executes the computer program in the memory 401 to implement the methods provided by the method embodiments as described above.

In an embodiment, the feature map enhancement apparatus of a convolutional neural network provided in the present application is exemplified by an electronic device. The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.

The memory may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium and executed by a processor to implement the methods of the various embodiments of the present application above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

An embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program is used for implementing the methods provided by the method embodiments described above when being executed by a processor.

In practice, the computer program in this embodiment may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + +, etc., and conventional procedural programming languages, such as the "C" programming language or similar programming languages, for performing the operations of the embodiments of the present application. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

In practice, the computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing descriptions of specific exemplary embodiments of the present application have been presented for purposes of illustration and description. It is not intended to limit the application to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the present application and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the present application and various alternatives and modifications thereof. It is intended that the scope of the application be defined by the claims and their equivalents.

Claims

1. A feature map enhancement method for a convolutional neural network, comprising:

obtaining an enhanced feature map corresponding to the feature map of the certain layer according to all the enhancer feature maps;

wherein, obtaining the attention enhancement factor of each channel in the corresponding sub-feature map according to the corresponding two channel dimension vectors comprises:

activating and adding the two channel dimension vectors after dimension reduction by using a ReLU activation function to obtain an attention enhancement factor corresponding to each channel in the sub-feature map;

wherein, obtaining a corresponding enhancer feature map according to the attention enhancing factor and the corresponding sub-feature map comprises:

normalizing the attention enhancement factor using a SoftMax function;

regularizing the first sub-feature graph to obtain a second sub-feature graph;

and taking the third sub-feature map as an enhancer feature map.

2. The method according to claim 1, wherein said activating the second sub-feature map using a Sigmoid activation function, resulting in an enhanced third sub-feature map, comprises:

3. A feature map enhancement apparatus for a convolutional neural network, comprising:

the output module is used for obtaining an enhanced feature map corresponding to the feature map of the certain layer according to all the enhanced feature maps;

wherein, the enhancement module is specifically configured to:

normalizing the attention enhancement factor using a SoftMax function;

regularizing the first sub-feature graph to obtain a second sub-feature graph;

and taking the third sub-feature map as an enhancer feature map.

4. The apparatus according to claim 3, wherein the enhancement module is specifically configured to:

5. An electronic device, comprising: a memory and a processor;

the memory for storing a computer program;

wherein the processor executes the computer program in the memory to implement the method of any one of claims 1-2.

6. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of any one of claims 1-2.