CN110781923A

CN110781923A - Feature extraction method and device

Info

Publication number: CN110781923A
Application number: CN201910927813.8A
Authority: CN
Inventors: 贾琳; 赵磊
Original assignee: Chongqing Terminus Technology Co Ltd
Current assignee: Chongqing Terminus Technology Co Ltd
Priority date: 2019-09-27
Filing date: 2019-09-27
Publication date: 2020-02-11
Anticipated expiration: 2039-09-27
Also published as: CN110781923B

Abstract

The invention discloses a feature extraction method, which comprises the following steps: inputting an original characteristic map into a trained characteristic extraction model, grouping the original characteristic map according to channels by the characteristic extraction model through a packet network to obtain a G group characteristic set, outputting the G group characteristic set to a multi-scale enhancement network in the model, respectively carrying out multi-scale enhancement processing on the G group characteristic set by the multi-scale enhancement network to obtain a G group processed characteristic set, outputting the G group processed characteristic set to a post-processing network in the model, splicing the G group processed characteristic set according to the channels by the post-processing network, and adding the spliced characteristic map and the original characteristic map; the multi-scale enhancement processing comprises pooling processing, convolution processing, up-sampling processing and accumulation processing. The resolution of the features can be reduced through pooling processing, and then the amount of computation and the amount of parameters are reduced, upsampling is performed after convolution to restore the resolution, and then accumulation is performed with the features before pooling to restore feature details, so that the amount of computation and the amount of parameters are reduced while feature effectiveness is ensured.

Description

Feature extraction method and device

Technical Field

The invention relates to the technical field of computer vision, in particular to a feature extraction method and device.

Background

In the field of computer vision, feature information extraction is a necessary step for realizing various types of network models.

In the prior art, when feature information is extracted, a deep residual error network Res2Net network is usually used for extracting the feature information so as to enhance the extraction capability of multi-scale features and avoid the influence of gradient disappearance on a convolutional neural network. However, in Res2Net, after the input images subjected to the convolution processing are grouped, each group of features also needs to be subjected to the convolution processing using a convolution group, and thus the amount of calculation and the amount of parameters are large.

Disclosure of Invention

The present invention provides a feature extraction method and device for overcoming the above-mentioned deficiencies in the prior art, and the object is achieved by the following technical solutions.

A first aspect of the present invention provides a feature extraction method, including:

inputting an original feature map into a trained feature extraction model, grouping the original feature map according to channels by the feature extraction model through a packet network to obtain G groups of feature subsets, outputting the G groups of feature subsets to a multi-scale enhancement network in the feature extraction model, respectively performing multi-scale enhancement processing on the G groups of feature subsets by the multi-scale enhancement network to obtain G groups of processed feature subsets, outputting the G groups of processed feature subsets to a post-processing network in the feature extraction model, splicing the G groups of processed feature subsets according to the channels by the post-processing network, and adding the spliced feature maps and the original feature map to obtain an output feature map;

acquiring an output characteristic diagram output by the characteristic extraction model;

wherein the multi-scale enhancement processing comprises pooling processing, convolution processing, upsampling processing and accumulation processing.

A second aspect of the present invention provides a feature extraction apparatus, comprising:

the characteristic extraction module is used for inputting an original characteristic diagram into a trained characteristic extraction model, grouping the original characteristic diagram according to channels by the characteristic extraction model through a packet network to obtain G groups of characteristic subsets, outputting the G groups of characteristic subsets to a multi-scale enhancement network in the characteristic extraction model, respectively carrying out multi-scale enhancement processing on the G groups of characteristic subsets by the multi-scale enhancement network to obtain G groups of processed characteristic subsets, outputting the G groups of processed characteristic subsets to a post-processing network in the characteristic extraction model, splicing the G groups of processed characteristic subsets according to the channels by the post-processing network, and adding the spliced characteristic diagram and the original characteristic diagram to obtain an output characteristic diagram;

the acquisition module is used for acquiring an output characteristic diagram output by the characteristic extraction model;

In the embodiment of the application, after the original feature map is input into the feature extraction model, the original feature map is divided into G groups of feature subsets through a packet network, a multi-scale enhancement network respectively performs multi-scale enhancement processing on each group of feature subsets, then the post-processing network splices the G groups of processed feature subsets, and the spliced feature map and the original feature map are added to obtain an output feature map. Wherein the multi-scale enhancement processing for each set of feature subsets comprises pooling processing, convolution processing, upsampling processing, and accumulation processing.

Based on the above description, it can be seen that, by using the multi-scale enhanced network to replace the 3 × 3 convolution group used in the existing Res2Net network, since the multi-scale enhanced network is subjected to pooling processing to reduce the resolution of the feature subsets before performing convolution processing on each group of feature subsets, thereby reducing the amount of computation and parameters, and is subjected to upsampling processing after performing convolution processing to restore the feature subsets to the resolution before performing pooling processing, and then is subjected to accumulation processing with the feature subsets before performing pooling processing to restore the feature details lost by the pooling operation, the amount of computation and parameters are reduced while ensuring the validity of the feature extraction information.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

fig. 1 is a schematic diagram illustrating a structure of a Res2Net network according to an exemplary embodiment of the present invention;

FIG. 2 is a schematic diagram of a feature extraction model according to an exemplary embodiment of the present invention;

FIG. 3A is a flow diagram illustrating an embodiment of a method for feature extraction according to an illustrative embodiment of the invention;

FIG. 3B is a diagram illustrating a packet network structure according to the embodiment shown in FIG. 3A;

FIG. 3C is a schematic diagram of a post-processing network according to the embodiment of FIG. 3A;

FIG. 4 is a diagram illustrating a hardware configuration of an electronic device according to an exemplary embodiment of the present application;

fig. 5 is a flowchart illustrating an embodiment of a feature extraction apparatus according to an exemplary embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

With the development of deep learning, a Convolutional Neural Network (CNN) is more and more widely applied in the field of computer vision, and particularly, the deep residual error network ResNet is provided, so that the CNN network design can not be influenced by gradient disappearance, a deep CNN network can be trained, and effective convolutional characteristic information can be extracted to the maximum extent. Therefore, in the field of computer vision, many backbone networks use the ResNet network to extract image features for subsequent classification, detection, segmentation and other tasks.

In order to further improve the effectiveness of feature extraction information, a Res2Net is proposed on the basis of a Res Net network, as shown in an exemplary Res2Net network structure shown in fig. 1, after an input feature map is subjected to 1 × 1 convolution kernel and grouped, four groups of feature subsets of X1, X2, X3 and X4 are obtained, for a first group of feature subsets X1, a feature subset Y1 is obtained by performing convolution processing by using a 3 × 3 convolution group, from a second group of feature subsets, for each group of feature subsets, after the group of feature subsets is spliced with a previous group of feature subsets, a 3 × 3 convolution group is input for performing convolution processing, although Res2Net improves the effectiveness of convolution feature extraction information by using multi-scale information, each group of feature subsets needs to perform convolution processing by using the 3 × 3 convolution group, and the calculation load and the parameter amount are large.

In order to solve the above technical problems, the present invention provides a feature extraction model, as shown in fig. 2, the feature extraction model includes a packet network, a multi-scale enhancement network and a post-processing network, after an original feature map is divided into G groups of feature subsets by the packet network, the multi-scale enhancement network performs multi-scale enhancement processing on each group of feature subsets, the post-processing network splices the G groups of processed feature subsets, and adds the spliced feature map and the original feature map to obtain an output feature map.

Wherein the multi-scale enhancement processing for each set of feature subsets comprises pooling processing, convolution processing, upsampling processing, and accumulation processing.

The feature extraction method implemented by the feature extraction model described above is explained in detail below with specific embodiments.

Fig. 3A is a flowchart illustrating an embodiment of a feature extraction method according to an exemplary embodiment of the present invention, the feature extraction method uses the feature extraction model shown in fig. 2 to perform feature extraction, and can be applied to an electronic device (e.g., a PC, a mobile phone terminal, etc.) as shown in fig. 3A, and the feature extraction method includes the following steps:

step 301: inputting the original characteristic diagram into a trained characteristic extraction model, grouping the original characteristic diagram according to channels by the characteristic extraction model through a packet network to obtain G groups of characteristic subsets, outputting the G groups of characteristic subsets to a multi-scale enhancement network in the characteristic extraction model, respectively carrying out multi-scale enhancement processing on the G groups of characteristic subsets by the multi-scale enhancement network to obtain G groups of processed characteristic subsets, outputting the G groups of processed characteristic subsets to a post-processing network in the characteristic extraction model, splicing the G groups of processed characteristic subsets according to the channels by the post-processing network, and adding the spliced characteristic diagram and the original characteristic diagram to obtain an output characteristic diagram.

In an embodiment, for a processing procedure of a packet network, as shown in fig. 3B of a packet network structure, a feature map after dimension reduction may be obtained by performing dimension reduction processing on an original feature map through a first convolution layer in the packet network, and output to a grouping layer in the packet network, where the grouping layer groups the feature map after dimension reduction according to a channel to obtain a G-group feature subset.

The original feature map is composed of a plurality of channel feature maps, so that after grouping, the size of each group of feature subsets is the same, but the number of channels of each group of feature subsets is 1/G of the number of channels of the feature map after dimension reduction.

Illustratively, the first convolution layer implementing the dimensionality reduction process may be implemented using a 1 × 1 convolution kernel to reduce the number of channels of the input feature map.

In the present invention, the grouping policy of the grouping layer may be set according to practical experience, for example, the feature map of each channel in the feature map after the dimension reduction processing may be used as a set of feature subsets.

In an embodiment, for the processing procedure of the multi-scale enhanced network, a first set of processed feature subsets may be obtained by performing multi-scale enhancement processing on the first set of feature subsets, and then, starting from a second set of feature subsets, for each set of feature subsets, the set of processed feature subsets is obtained by splicing the last set of processed feature subsets with the set of feature subsets according to a channel, and performing multi-scale enhancement processing on the spliced set of feature subsets.

Illustratively, the pooling process may be implemented with a maximum pooling layer, the convolution process may be implemented with 3 × 3 convolution groups, and the upsampling process may be implemented with an upsampling layer. The accumulation processing may be performed by adding pixels along the channel dimension, that is, for each channel, adding the channel feature subset obtained by splicing and the corresponding pixels in the channel feature subset subjected to the upsampling processing to implement information aggregation, thereby further enhancing the effectiveness of the feature extraction information.

Assuming that the multi-scale enhancement process is K (), the first set of processed feature subsets Y1 ═ K (X1), the ith set of processed feature subsets Yi ═ K (Xi + Y (i-1)), 1< i ≦ G, and "+" in the formula represents concatenation.

Based on the above description, in addition to one multi-scale enhancement processing module corresponding to each group of feature subsets, a splicing layer also corresponds to each group of feature subsets starting from the second group of feature subsets in the multi-scale enhancement network.

In an embodiment, for a processing procedure of a post-processing network, as shown in fig. 3C, for a structure of the post-processing network, a splicing feature map is obtained by splicing, by a splicing layer in the post-processing network, the G groups of processed feature subsets according to channels, and is output to a second convolution layer in the post-processing network, the second convolution layer performs dimension-increasing processing on the splicing feature map to obtain a dimension-increased splicing feature map and outputs the dimension-increased splicing feature map to an SE (squeze-and-Excitation compression-activation) layer in the post-processing network, the SE layer performs enhancement processing on the dimension-increased splicing feature map to obtain an enhancement feature map and outputs the enhancement feature map to an accumulation layer of the post-processing network, and the accumulation layer adds the original feature map and the enhancement feature map to obtain an output feature map.

And the number and the size of the channels of the spliced feature map after the dimension increasing and the feature map after the dimension reducing are the same.

For example, the second convolution layer may also be implemented by using a 1 × 1 convolution kernel to restore the number of channels of the stitched feature map to the number of channels of the input feature map, so that the number of channels of the stitched feature map after the dimension is increased is the same as that of the original feature map. The accumulation layer also adds in pixels along the channel dimension, i.e. for each channel, the channel feature map is added to the corresponding pixel in the channel enhancement feature map.

Step 302: and acquiring an output characteristic diagram output by the characteristic extraction model.

For example, the obtained output feature map can be applied to classification, detection, segmentation and other tasks.

Fig. 4 is a hardware block diagram of an electronic device according to an exemplary embodiment of the present application, where the electronic device includes: a communication interface 401, a processor 402, a machine-readable storage medium 403, and a bus 404; wherein the communication interface 401, the processor 402 and the machine-readable storage medium 403 communicate with each other via a bus 404. The processor 402 may execute the above-described feature extraction method by reading and executing machine-executable instructions in the machine-readable storage medium 403 corresponding to the control logic of the feature extraction method, and the specific content of the method is described in the above embodiments, which will not be described herein again.

The machine-readable storage medium 403 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: volatile memory, non-volatile memory, or similar storage media. In particular, the machine-readable storage medium 403 may be a RAM (random Access Memory), a flash Memory, a storage drive (e.g., a hard disk drive), any type of storage disk (e.g., an optical disk, a DVD, etc.), or similar storage medium, or a combination thereof.

Fig. 5 is a flowchart illustrating an embodiment of a feature extraction apparatus according to an exemplary embodiment of the present invention, which performs feature extraction using the feature extraction model as illustrated in fig. 2 and can be applied to an electronic device, as illustrated in fig. 5, and includes:

a feature extraction module 510, configured to input an original feature map into a trained feature extraction model, group the original feature map according to channels by the feature extraction model through a packet network to obtain G groups of feature subsets, and output the G groups of feature subsets to a multi-scale enhancement network in the feature extraction model, perform multi-scale enhancement processing on the G groups of feature subsets by the multi-scale enhancement network to obtain G groups of processed feature subsets, and output the G groups of processed feature subsets to a post-processing network in the feature extraction model, splice the G groups of processed feature subsets according to channels by the post-processing network, and add the spliced feature maps and the original feature map to obtain an output feature map;

an obtaining module 520, configured to obtain an output feature map output by the feature extraction model;

In an optional implementation manner, the feature extraction module 510 is specifically configured to, in a process that a packet network groups the original feature maps according to channels to obtain G groups of feature subsets, perform, through a first convolution layer in the packet network, a dimension reduction process on the original feature maps to obtain a feature map after dimension reduction, and output the feature map to a packet layer in the packet network; the grouping layer groups the feature maps subjected to the dimensionality reduction according to channels to obtain G groups of feature subsets; and the size of each group of feature subsets is the same, but the channel number of each group of feature subsets is 1/G of the channel number of the feature map after dimension reduction.

In an optional implementation manner, the feature extraction module 510 is specifically configured to, in a process that the multi-scale enhancement network performs multi-scale enhancement processing on G groups of feature subsets respectively to obtain G groups of processed feature subsets, perform multi-scale enhancement processing on a first group of feature subsets to obtain a first group of processed feature subsets; and starting from the second group of feature subsets, splicing the processed feature subsets of the previous group with the feature subsets of each group according to a channel, and performing multi-scale enhancement processing on the spliced feature subsets of the group to obtain the processed feature subsets of the group.

In an optional implementation manner, the feature extraction module 510 is specifically configured to, in the process that the post-processing network splices the G groups of processed feature subsets according to the channels and adds the spliced feature map and the original feature map to obtain the output feature map, splice the G groups of processed feature subsets according to the channels through the splice layer in the post-processing network to obtain a spliced feature map, and output the spliced feature map to the second convolution layer in the post-processing network; the second convolution layer carries out dimension increasing processing on the splicing characteristic diagram to obtain the splicing characteristic diagram after dimension increasing, and outputs the splicing characteristic diagram to a compression activation SE layer in the post-processing network; the SE layer performs enhancement processing on the spliced characteristic diagram after the dimension is increased to obtain an enhanced characteristic diagram, and outputs the enhanced characteristic diagram to an accumulation layer of the post-processing network; the accumulation layer adds the original characteristic diagram and the enhanced characteristic diagram to obtain an output characteristic diagram; and the number and the size of channels of the spliced feature map after the dimension is increased are the same as those of the original feature map.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method of feature extraction, the method comprising:

2. The method of claim 1, wherein grouping the raw feature maps into G groups of feature subsets according to channels in a packet network comprises:

carrying out dimensionality reduction processing on the original characteristic graph through a first convolution layer in the packet network to obtain a dimensionality reduced characteristic graph, and outputting the dimensionality reduced characteristic graph to a packet layer in the packet network;

the grouping layer groups the feature maps subjected to the dimensionality reduction according to channels to obtain G groups of feature subsets;

and the size of each group of feature subsets is the same, but the channel number of each group of feature subsets is 1/G of the channel number of the feature map after dimension reduction.

3. The method according to claim 1, wherein the multi-scale enhancement network performs multi-scale enhancement processing on the G groups of feature subsets to obtain G groups of processed feature subsets, respectively, and includes:

performing multi-scale enhancement processing on the first group of feature subsets to obtain a first group of processed feature subsets;

and starting from the second group of feature subsets, splicing the processed feature subsets of the previous group with the feature subsets of each group according to a channel, and performing multi-scale enhancement processing on the spliced feature subsets of the group to obtain the processed feature subsets of the group.

4. The method of claim 1, wherein the post-processing network concatenates the G groups of processed feature subsets according to channels and adds the concatenated feature maps to the original feature map to obtain an output feature map, comprising:

splicing the characteristic subsets processed by the G groups according to the channels by the splicing layer in the post-processing network to obtain a splicing characteristic diagram, and outputting the splicing characteristic diagram to a second convolution layer in the post-processing network;

the second convolution layer carries out dimension increasing processing on the splicing characteristic diagram to obtain the splicing characteristic diagram after dimension increasing, and outputs the splicing characteristic diagram to a compression activation SE layer in the post-processing network;

the SE layer performs enhancement processing on the spliced characteristic diagram after the dimension is increased to obtain an enhanced characteristic diagram, and outputs the enhanced characteristic diagram to an accumulation layer of the post-processing network;

the accumulation layer adds the original characteristic diagram and the enhanced characteristic diagram to obtain an output characteristic diagram;

and the number and the size of channels of the spliced feature map after the dimension is increased are the same as those of the original feature map.

5. A feature extraction apparatus, characterized in that the apparatus comprises:

6. The apparatus according to claim 5, wherein the feature extraction module is specifically configured to, in a process of grouping the original feature maps according to channels in a packet network to obtain G groups of feature subsets, perform a dimension reduction process on the original feature maps through a first convolution layer in the packet network to obtain a feature map after the dimension reduction, and output the feature map to a packet layer in the packet network; the grouping layer groups the feature maps subjected to the dimensionality reduction according to channels to obtain G groups of feature subsets; and the size of each group of feature subsets is the same, but the channel number of each group of feature subsets is 1/G of the channel number of the feature map after dimension reduction.

7. The apparatus according to claim 5, wherein the feature extraction module is specifically configured to, in a process that the multi-scale enhancement network performs multi-scale enhancement processing on G groups of feature subsets respectively to obtain G groups of processed feature subsets, perform multi-scale enhancement processing on a first group of feature subsets to obtain a first group of processed feature subsets; and starting from the second group of feature subsets, splicing the processed feature subsets of the previous group with the feature subsets of each group according to a channel, and performing multi-scale enhancement processing on the spliced feature subsets of the group to obtain the processed feature subsets of the group.

8. The apparatus according to claim 5, wherein the feature extraction module is specifically configured to, in the process that the post-processing network splices the G groups of processed feature subsets according to the channels and adds the spliced feature map to the original feature map to obtain the output feature map, splice the G groups of processed feature subsets according to the channels through a splice layer in the post-processing network to obtain a spliced feature map, and output the spliced feature map to a second convolution layer in the post-processing network; the second convolution layer carries out dimension increasing processing on the splicing characteristic diagram to obtain the splicing characteristic diagram after dimension increasing, and outputs the splicing characteristic diagram to a compression activation SE layer in the post-processing network; the SE layer performs enhancement processing on the spliced characteristic diagram after the dimension is increased to obtain an enhanced characteristic diagram, and outputs the enhanced characteristic diagram to an accumulation layer of the post-processing network; the accumulation layer adds the original characteristic diagram and the enhanced characteristic diagram to obtain an output characteristic diagram; and the number and the size of channels of the spliced feature map after the dimension is increased are the same as those of the original feature map.