CN110781923B

CN110781923B - Feature extraction method and device

Info

Publication number: CN110781923B
Application number: CN201910927813.8A
Authority: CN
Inventors: 贾琳; 赵磊
Original assignee: Chongqing Terminus Technology Co Ltd
Current assignee: Chongqing Terminus Technology Co Ltd
Priority date: 2019-09-27
Filing date: 2019-09-27
Publication date: 2023-02-07
Anticipated expiration: 2039-09-27
Also published as: CN110781923A

Abstract

The invention discloses a feature extraction method, which comprises the following steps: inputting an original characteristic diagram into a trained characteristic extraction model, grouping the original characteristic diagram according to channels by the characteristic extraction model through a packet network to obtain a G group characteristic set, outputting the G group characteristic set to a multi-scale enhancement network in the model, respectively carrying out multi-scale enhancement processing on the G group characteristic set by the multi-scale enhancement network to obtain a G group processed characteristic set, outputting the G group processed characteristic set to a post-processing network in the model, splicing the G group processed characteristic set according to the channels by the post-processing network, and adding the spliced characteristic diagram and the original characteristic diagram; the multi-scale enhancement processing comprises pooling processing, convolution processing, up-sampling processing and accumulation processing. The resolution of the features can be reduced through pooling processing, so that the amount of computation and the number of parameters are reduced, upsampling is carried out after convolution to restore the resolution, and then accumulation is carried out with the features before pooling to restore feature details, so that the amount of computation and the number of parameters are reduced while feature effectiveness is ensured.

Description

Feature extraction method and device

Technical Field

The invention relates to the technical field of computer vision, in particular to a feature extraction method and device.

Background

In the field of computer vision, feature information extraction is a necessary step for realizing various types of network models.

In the prior art, when feature information is extracted, a deep residual error network Res2Net network is usually used for extracting the feature information so as to enhance the extraction capability of multi-scale features and avoid the influence of gradient disappearance on a convolutional neural network. However, in Res2Net, after the input images subjected to the convolution processing are grouped, each group of features also needs to be subjected to the convolution processing using a convolution group, and thus the amount of calculation and the amount of parameters are large.

Disclosure of Invention

The present invention provides a feature extraction method and apparatus for overcoming the above-mentioned deficiencies in the prior art, and the object is achieved by the following technical solutions.

A first aspect of the present invention provides a feature extraction method, including:

inputting an original feature map into a trained feature extraction model, grouping the original feature map according to channels by the feature extraction model through a packet network to obtain G groups of feature subsets, outputting the G groups of feature subsets to a multi-scale enhancement network in the feature extraction model, respectively performing multi-scale enhancement processing on the G groups of feature subsets by the multi-scale enhancement network to obtain G groups of processed feature subsets, outputting the G groups of processed feature subsets to a post-processing network in the feature extraction model, splicing the G groups of processed feature subsets according to the channels by the post-processing network, and adding the spliced feature map and the original feature map to obtain an output feature map;

acquiring an output characteristic diagram output by the characteristic extraction model;

wherein the multi-scale enhancement processing comprises pooling processing, convolution processing, upsampling processing and accumulation processing.

A second aspect of the present invention provides a feature extraction apparatus, comprising:

the characteristic extraction module is used for inputting the original characteristic diagram into the trained characteristic extraction model, grouping the original characteristic diagram according to channels by the characteristic extraction model through a packet network to obtain G groups of characteristic subsets, outputting the G groups of characteristic subsets to a multi-scale enhancement network in the characteristic extraction model, respectively carrying out multi-scale enhancement processing on the G groups of characteristic subsets by the multi-scale enhancement network to obtain G groups of processed characteristic subsets, outputting the G groups of processed characteristic subsets to a post-processing network in the characteristic extraction model, splicing the G groups of processed characteristic subsets according to the channels by the post-processing network, and adding the spliced characteristic diagram and the original characteristic diagram to obtain an output characteristic diagram;

the acquisition module is used for acquiring an output characteristic diagram output by the characteristic extraction model;

In the embodiment of the application, after the original feature map is input into the feature extraction model, the original feature map is divided into G groups of feature subsets through a packet network, a multi-scale enhancement network respectively performs multi-scale enhancement processing on each group of feature subsets, then the post-processing network splices the G groups of processed feature subsets, and the spliced feature map and the original feature map are added to obtain an output feature map. Wherein the multi-scale enhancement processing for each set of feature subsets comprises pooling processing, convolution processing, upsampling processing, and accumulation processing.

Based on the above description, it can be seen that, by using the multi-scale enhanced network to replace the 3 × 3 convolution group used in the existing Res2Net network, since the multi-scale enhanced network performs the pooling process to reduce the resolution of the feature subsets before performing the convolution process on each group of feature subsets, and further reduces the amount of computation and parameters, performs the upsampling process after performing the convolution process to restore the feature subsets to the resolution before performing the pooling process, and then performs the accumulation process with the feature subsets before performing the pooling process to restore the feature details lost by the pooling operation, the amount of computation and parameters are reduced while ensuring the validity of the feature extraction information.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

fig. 1 is a schematic diagram illustrating a structure of a Res2Net network according to an exemplary embodiment of the present invention;

FIG. 2 is a schematic diagram of a feature extraction model according to an exemplary embodiment of the present invention;

FIG. 3A is a flowchart illustrating an embodiment of a method of feature extraction according to an exemplary embodiment of the present invention;

FIG. 3B is a schematic diagram of a packet network structure according to the embodiment shown in FIG. 3A;

FIG. 3C is a schematic diagram of a post-processing network according to the embodiment of FIG. 3A;

FIG. 4 is a hardware block diagram of an electronic device shown in accordance with an exemplary embodiment of the present application;

fig. 5 is a flowchart illustrating an embodiment of a feature extraction apparatus according to an exemplary embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this disclosure and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at" \8230; "or" when 8230; \8230; "or" in response to a determination ", depending on the context.

With the development of deep learning, a Convolutional Neural Network (CNN) is more and more widely applied in the field of computer vision, and particularly, the deep residual error network ResNet is provided, so that the CNN network design can not be influenced by gradient disappearance, a deep CNN network can be trained, and effective convolutional characteristic information can be extracted to the maximum extent. Therefore, in the field of computer vision, many backbone networks use the ResNet network to extract image features for subsequent classification, detection, segmentation and other tasks.

In order to further improve the effectiveness of feature extraction information, a Res2Net is proposed on the basis of a Res Net network, as shown in fig. 1, in an exemplary Res2Net network structure, an input feature map is subjected to 1 × 1 convolution kernel and is grouped to obtain four groups of feature subsets of X1, X2, X3 and X4, for a first group of feature subsets X1, a feature subset Y1 is obtained by performing convolution processing on a 3 × 3 convolution group, from a second group of feature subsets, for each group of feature subsets, the group of feature subsets needs to be spliced with a previous group of feature subsets and then input into a 3 × 3 convolution group for convolution processing, while Res2Net improves the effectiveness of convolution feature extraction information by using multi-scale information, each group of feature subsets needs to perform convolution processing by using a 3 × 3 convolution group, and the calculation burden and parameter amount are large.

In order to solve the above technical problems, the present invention provides a feature extraction model, as shown in fig. 2, the feature extraction model includes a packet network, a multi-scale enhancement network, and a post-processing network, after an original feature map is divided into G groups of feature subsets by the packet network, the multi-scale enhancement network performs multi-scale enhancement processing on each group of feature subsets, the post-processing network splices the G groups of processed feature subsets, and adds the spliced feature map and the original feature map to obtain an output feature map.

Wherein the multi-scale enhancement processing for each set of feature subsets comprises pooling processing, convolution processing, upsampling processing, and accumulation processing.

Based on the above description, it can be seen that, by using the multi-scale enhancement network to replace the 3 × 3 convolution group used in the existing Res2Net network, since the multi-scale enhancement network performs pooling processing to reduce the resolution of the feature subsets before performing convolution processing on each group of feature subsets, thereby reducing the amount of computation and parameters, performs upsampling processing after performing convolution processing to restore the feature subsets to the resolution before performing pooling processing, and then performs accumulation processing with the feature subsets before performing pooling processing to restore the feature details lost by the pooling operation, thereby reducing the amount of computation and parameters while ensuring the validity of the feature extraction information.

The feature extraction method implemented by the feature extraction model described above is explained in detail below with specific embodiments.

Fig. 3A is a flowchart illustrating an embodiment of a feature extraction method according to an exemplary embodiment of the present invention, the feature extraction method uses the feature extraction model shown in fig. 2 to perform feature extraction, and can be applied to electronic devices (such as a PC, a mobile phone terminal, and the like) as shown in fig. 3A, and the feature extraction method includes the following steps:

step 301: inputting the original characteristic diagram into a trained characteristic extraction model, grouping the original characteristic diagram according to channels by the characteristic extraction model through a packet network to obtain G groups of characteristic subsets, outputting the G groups of characteristic subsets to a multi-scale enhancement network in the characteristic extraction model, respectively carrying out multi-scale enhancement processing on the G groups of characteristic subsets by the multi-scale enhancement network to obtain G groups of processed characteristic subsets, outputting the G groups of processed characteristic subsets to a post-processing network in the characteristic extraction model, splicing the G groups of processed characteristic subsets according to the channels by the post-processing network, and adding the spliced characteristic diagram and the original characteristic diagram to obtain an output characteristic diagram.

In an embodiment, for a processing procedure of a packet network, as shown in fig. 3B of a packet network structure, a feature map after dimension reduction may be obtained by performing dimension reduction processing on an original feature map through a first convolution layer in the packet network, and output to a grouping layer in the packet network, where the grouping layer groups the feature map after dimension reduction according to a channel to obtain a G-group feature subset.

The original feature map is composed of a plurality of channel feature maps, so that after grouping, the size of each group of feature subsets is the same, but the number of channels of each group of feature subsets is 1/G of the number of channels of the feature map after dimension reduction.

Illustratively, the first convolution layer implementing the dimensionality reduction process may be implemented using a 1 × 1 convolution kernel to reduce the number of channels of the input feature map.

In the present invention, the grouping policy of the grouping layer may be set according to practical experience, for example, the feature map of each channel in the feature map after the dimension reduction processing may be used as a set of feature subsets.

In an embodiment, for the processing procedure of the multi-scale enhanced network, a first set of processed feature subsets may be obtained by performing multi-scale enhancement processing on the first set of feature subsets, and then, starting from a second set of feature subsets, for each set of feature subsets, the set of processed feature subsets is obtained by splicing the last set of processed feature subsets with the set of feature subsets according to a channel, and performing multi-scale enhancement processing on the spliced set of feature subsets.

Wherein the multi-scale enhancement processing for each set of feature subsets includes pooling, convolution, upsampling, and accumulating.

Illustratively, the pooling process may be implemented using a maximal pooling layer, the convolution process may be implemented using 3 × 3 convolution groups, and the upsampling process may be implemented using an upsampling layer. The accumulation processing may be performed by adding pixels along the channel dimension, that is, for each channel, adding the channel feature subset obtained by splicing and the corresponding pixels in the channel feature subset subjected to the upsampling processing to implement information aggregation, thereby further enhancing the effectiveness of the feature extraction information.

Assuming that the multi-scale enhancement processing is K (), the first set of processed feature subsets Y1= K (X1), the ith set of processed feature subsets Yi = K (Xi + Y (i-1)), 1-straw i ≦ G, and "+" in the formula represents concatenation.

Based on the above description, in addition to one multi-scale enhancement processing module corresponding to each group of feature subsets, a splicing layer also corresponds to each group of feature subsets starting from the second group of feature subsets in the multi-scale enhancement network.

In an embodiment, for a processing procedure of a post-processing network, as shown in fig. 3C, for a structure of the post-processing network, a splicing feature map is obtained by splicing, by a splicing layer in the post-processing network, the G groups of processed feature subsets according to channels, and is output to a second convolution layer in the post-processing network, the second convolution layer performs dimension-increasing processing on the splicing feature map to obtain a dimension-increased splicing feature map and outputs the dimension-increased splicing feature map to an SE (squeze-and-Excitation compression-activation) layer in the post-processing network, the SE layer performs enhancement processing on the dimension-increased splicing feature map to obtain an enhancement feature map and outputs the enhancement feature map to an accumulation layer of the post-processing network, and the accumulation layer adds the original feature map and the enhancement feature map to obtain an output feature map.

And the number and the size of channels of the spliced feature map after the dimension is increased and the feature map after the dimension is reduced are the same.

For example, the second convolution layer may also be implemented by using a convolution kernel of 1 × 1, so as to restore the number of channels of the stitched feature map to the number of channels of the input feature map, so that the number of channels of the stitched feature map after the dimensionality is the same as that of the original feature map. The accumulation layer also adds in pixels along the channel dimension, i.e. for each channel, the channel feature map is added to the corresponding pixel in the channel enhancement feature map.

Step 302: and acquiring an output characteristic diagram output by the characteristic extraction model.

For example, the obtained output feature map can be applied to classification, detection, segmentation and other tasks.

In the embodiment of the application, after the original feature map is input into the feature extraction model, the original feature map is divided into G groups of feature subsets through a packet network, a multi-scale enhancement network is used for respectively carrying out multi-scale enhancement processing on each group of feature subsets, then the post-processing network is used for splicing the G groups of processed feature subsets, and the spliced feature map and the original feature map are added to obtain the output feature map. Wherein the multi-scale enhancement processing for each set of feature subsets includes pooling, convolution, upsampling, and accumulating.

Fig. 4 is a hardware block diagram of an electronic device according to an exemplary embodiment of the present application, where the electronic device includes: a communication interface 401, a processor 402, a machine-readable storage medium 403, and a bus 404; wherein the communication interface 401, the processor 402 and the machine-readable storage medium 403 communicate with each other via a bus 404. The processor 402 may execute the above-described feature extraction method by reading and executing machine executable instructions in the machine-readable storage medium 403 corresponding to the control logic of the feature extraction method, and the specific content of the method is referred to the above-described embodiments, which will not be described again here.

The machine-readable storage medium 403 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: volatile memory, non-volatile memory, or similar storage media. In particular, the machine-readable storage medium 403 may be a RAM (random Access Memory), a flash Memory, a storage drive (such as a hard disk drive), any type of storage disk (such as an optical disk, a DVD, etc.), or similar storage medium, or a combination thereof.

Fig. 5 is a flowchart illustrating an embodiment of a feature extraction apparatus according to an exemplary embodiment of the present invention, which performs feature extraction using the feature extraction model as illustrated in fig. 2 and can be applied to an electronic device, as illustrated in fig. 5, and includes:

a feature extraction module 510, configured to input an original feature map into a trained feature extraction model, group the original feature map according to channels by the feature extraction model through a packet network to obtain G groups of feature subsets, and output the G groups of feature subsets to a multi-scale enhancement network in the feature extraction model, perform multi-scale enhancement processing on the G groups of feature subsets by the multi-scale enhancement network to obtain G groups of processed feature subsets, and output the G groups of processed feature subsets to a post-processing network in the feature extraction model, splice the G groups of processed feature subsets according to channels by the post-processing network, and add the spliced feature maps and the original feature map to obtain an output feature map;

an obtaining module 520, configured to obtain an output feature map output by the feature extraction model;

wherein the multi-scale enhancement processing comprises pooling processing, convolution processing, upsampling processing, and accumulation processing.

In an optional implementation manner, the feature extraction module 510 is specifically configured to, in a process that a packet network groups the original feature maps according to channels to obtain G groups of feature subsets, perform, through a first convolution layer in the packet network, a dimension reduction process on the original feature maps to obtain a feature map after dimension reduction, and output the feature map to a packet layer in the packet network; the grouping layer groups the feature maps after the dimensionality reduction according to channels to obtain G groups of feature subsets; and the size of each group of feature subsets is the same, but the number of channels of each group of feature subsets is 1/G of the number of channels of the feature map after dimension reduction.

In an optional implementation manner, the feature extraction module 510 is specifically configured to, in a process that the multi-scale enhancement network performs multi-scale enhancement processing on G groups of feature subsets respectively to obtain G groups of processed feature subsets, perform multi-scale enhancement processing on a first group of feature subsets to obtain a first group of processed feature subsets; and starting from the second group of feature subsets, splicing the processed feature subsets of the previous group with the feature subsets of each group according to a channel, and performing multi-scale enhancement processing on the spliced feature subsets of the group to obtain the processed feature subsets of the group.

In an optional implementation manner, the feature extraction module 510 is specifically configured to, in a process that the post-processing network splices the G groups of processed feature subsets according to channels and adds the spliced feature map to the original feature map to obtain an output feature map, splice the G groups of processed feature subsets according to the channels through a splice layer in the post-processing network to obtain a splice feature map, and output the splice feature map to a second convolution layer in the post-processing network; the second convolution layer carries out dimension increasing processing on the splicing characteristic diagram to obtain the splicing characteristic diagram after dimension increasing, and outputs the splicing characteristic diagram to a compression activation SE layer in the post-processing network; the SE layer performs enhancement processing on the spliced characteristic diagram after the dimension is increased to obtain an enhanced characteristic diagram, and outputs the enhanced characteristic diagram to an accumulation layer of the post-processing network; the accumulation layer adds the original characteristic diagram and the enhanced characteristic diagram to obtain an output characteristic diagram; and the number and the size of channels of the spliced feature map after the dimension is increased are the same as those of the original feature map.

The specific details of the implementation process of the functions and actions of each unit in the above device are the implementation processes of the corresponding steps in the above method, and are not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. The invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus comprising the element.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for extracting features of an image, the method comprising:

inputting an original feature map of an input image into a trained feature extraction model, grouping the original feature map according to channels by the feature extraction model through a packet network to obtain G groups of feature subsets, outputting the G groups of feature subsets to a multi-scale enhancement network in the feature extraction model, respectively performing multi-scale enhancement processing on the G groups of feature subsets by the multi-scale enhancement network to obtain G groups of processed feature subsets, outputting the G groups of processed feature subsets to a post-processing network in the feature extraction model, splicing the G groups of processed feature subsets according to the channels by the post-processing network, and adding the spliced feature maps and the original feature map to obtain an output feature map;

acquiring an output feature map output by the feature extraction model, wherein the output feature map is used for any one of classification tasks, detection tasks and segmentation tasks;

the multi-scale enhancement network respectively performs multi-scale enhancement processing on the G groups of feature subsets to obtain the G groups of processed feature subsets, and the method comprises the following steps: performing multi-scale enhancement processing on the first group of feature subsets to obtain a first group of processed feature subsets; starting from the second group of feature subsets, splicing the processed feature subsets of the previous group with the feature subsets of each group according to a channel, and performing multi-scale enhancement processing on the spliced feature subsets of the group to obtain the processed feature subsets of the group;

the processing sequence of the multi-scale enhancement processing is as follows in sequence: pooling, convolution, up-sampling and accumulation.

2. The method of claim 1, wherein the packet network groups the raw feature maps into G groups of feature subsets according to channels, comprising:

carrying out dimension reduction processing on the original characteristic diagram through a first convolution layer in the packet network to obtain a dimension-reduced characteristic diagram, and outputting the dimension-reduced characteristic diagram to a packet layer in the packet network;

the grouping layer groups the feature maps subjected to the dimensionality reduction according to channels to obtain G groups of feature subsets;

and the size of each group of feature subsets is the same, but the channel number of each group of feature subsets is 1/G of the channel number of the feature map after dimension reduction.

3. The method of claim 1, wherein the post-processing network concatenates the G groups of processed feature subsets according to channels and adds the concatenated feature maps to the original feature map to obtain an output feature map, comprising:

splicing the characteristic subsets processed by the G groups according to the channels by the splicing layer in the post-processing network to obtain a splicing characteristic diagram, and outputting the splicing characteristic diagram to a second convolution layer in the post-processing network;

the second convolution layer carries out dimension increasing processing on the splicing characteristic diagram to obtain the splicing characteristic diagram after dimension increasing, and outputs the splicing characteristic diagram to a compression activation SE layer in the post-processing network;

the SE layer performs enhancement processing on the spliced characteristic diagram after the dimension is increased to obtain an enhanced characteristic diagram, and outputs the enhanced characteristic diagram to an accumulation layer of the post-processing network;

the accumulation layer is used for accumulating the original characteristic diagram and the enhanced characteristic diagram to obtain an output characteristic diagram;

and the number and the size of channels of the spliced feature map after the dimension is increased are the same as those of the original feature map.

4. An apparatus for extracting features of an image, the apparatus comprising:

the characteristic extraction module is used for inputting an original characteristic diagram of an input image into a trained characteristic extraction model, grouping the original characteristic diagram according to channels by the characteristic extraction model through a packet network to obtain G groups of characteristic subsets, outputting the G groups of characteristic subsets to a multi-scale enhancement network in the characteristic extraction model, respectively carrying out multi-scale enhancement processing on the G groups of characteristic subsets by the multi-scale enhancement network to obtain G groups of processed characteristic subsets, outputting the G groups of processed characteristic subsets to a post-processing network in the characteristic extraction model, splicing the G groups of processed characteristic subsets according to the channels by the post-processing network, and adding the spliced characteristic diagram and the original characteristic diagram to obtain an output characteristic diagram; the multi-scale enhancement network respectively performs multi-scale enhancement processing on the G groups of feature subsets to obtain G groups of processed feature subsets, and the method comprises the following steps: performing multi-scale enhancement processing on the first group of feature subsets to obtain a first group of processed feature subsets; starting from the second group of feature subsets, splicing the processed feature subsets of the previous group with the feature subsets of each group according to a channel, and performing multi-scale enhancement processing on the spliced feature subsets of the group to obtain the processed feature subsets of the group;

the acquisition module is used for acquiring an output feature map output by the feature extraction model, and the output feature map is used for any one of a classification task, a detection task and a segmentation task;

wherein, the processing sequence of the multi-scale enhancement processing is as follows in sequence: pooling, convolution, up-sampling and accumulation.

5. The apparatus according to claim 4, wherein the feature extraction module is specifically configured to, in a process of grouping the original feature maps according to channels in a packet network to obtain G groups of feature subsets, perform a dimension reduction process on the original feature maps through a first convolution layer in the packet network to obtain a feature map after the dimension reduction, and output the feature map to a packet layer in the packet network; the grouping layer groups the feature maps after the dimensionality reduction according to channels to obtain G groups of feature subsets; and the size of each group of feature subsets is the same, but the channel number of each group of feature subsets is 1/G of the channel number of the feature map after dimension reduction.

6. The apparatus according to claim 4, wherein the feature extraction module is specifically configured to, in a process that the multi-scale enhancement network performs multi-scale enhancement processing on G groups of feature subsets respectively to obtain G groups of processed feature subsets, perform multi-scale enhancement processing on a first group of feature subsets to obtain a first group of processed feature subsets; and starting from the second group of feature subsets, splicing the processed feature subsets of the previous group with the feature subsets of each group according to channels, and performing multi-scale enhancement processing on the spliced feature subsets of the group to obtain the processed feature subsets of the group.

7. The apparatus according to claim 4, wherein the feature extraction module is specifically configured to, in the process that the post-processing network splices the G groups of processed feature subsets according to the channels and adds the spliced feature map to the original feature map to obtain the output feature map, splice the G groups of processed feature subsets according to the channels through the splice layer in the post-processing network to obtain a spliced feature map, and output the spliced feature map to the second convolution layer in the post-processing network; the second convolution layer carries out dimension increasing processing on the splicing characteristic diagram to obtain the splicing characteristic diagram after dimension increasing, and outputs the splicing characteristic diagram to a compression activation SE layer in the post-processing network; the SE layer performs enhancement processing on the spliced characteristic diagram after the dimension is increased to obtain an enhanced characteristic diagram, and outputs the enhanced characteristic diagram to an accumulation layer of the post-processing network; the accumulation layer is used for accumulating the original characteristic diagram and the enhanced characteristic diagram to obtain an output characteristic diagram; and the number and the size of channels of the spliced characteristic diagram after the dimension is increased are the same as those of the original characteristic diagram.