CN111582458A

CN111582458A - Method and apparatus, device, and medium for processing feature map

Info

Publication number: CN111582458A
Application number: CN202010403392.1A
Authority: CN
Inventors: 喻冬东; 王长虎
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-05-13
Filing date: 2020-05-13
Publication date: 2020-08-25

Abstract

Embodiments of the present disclosure disclose methods and apparatus for processing a feature map. One embodiment of the method comprises: inputting a target characteristic diagram into an additional pooling network to obtain a first output result, wherein the target characteristic diagram is output of a first convolution layer of a bottleneck layer; inputting the first output result to a second convolution layer of the bottleneck layer to obtain a second output result; generating a third output result based on the target feature map, the additional convolution layer and the normalized exponential function; and generating a second feature map based on the second output result, the third output result and a third convolution layer of the bottleneck layer. The method and the device improve the accuracy of extracting the characteristic information of the target image without increasing the calculation amount of the network model.

Description

Method and apparatus, device, and medium for processing feature map

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a method and a device for processing a feature map.

Background

At present, Residual Network (ResNet) is widely accepted by people. Among them, the bottleneck layer in the residual error network is also used by people in a large amount. Improvements to the bottleneck layer in the residual network can reduce the number of parameters and the amount of computation. However, the above improvement often has a problem that the accuracy of feature map extraction is low. Therefore, a method for improving the accuracy of extracting the feature map by the network without increasing the amount of calculation and the number of parameters is needed.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose a method, apparatus, device and computer readable medium for processing a feature map to solve the technical problems mentioned in the background section above.

In a first aspect, an embodiment of the present disclosure provides a method for processing a feature map, the method including: inputting a target characteristic diagram into an additional pooling network to obtain a first output result, wherein the target characteristic diagram is output of a first convolution layer of a bottleneck layer; inputting the first output result into a second convolution layer of the bottleneck layer to obtain a second output result; generating a third output result based on the target characteristic diagram, the additional convolution layer and the normalized exponential function; and generating a second feature map based on the second output result, the third output result and the third convolution layer of the bottleneck layer.

In a second aspect, an embodiment of the present disclosure provides an apparatus for processing a feature map, the apparatus including: a first input unit configured to input a target feature map into the additional pooling network to obtain a first output result, wherein the target feature map is an output of the bottleneck layer first convolution layer; a second input unit configured to input the first output result to a second convolution layer of the bottleneck layer to obtain a second output result; a first generating unit configured to generate a third output result based on the target feature map, the additional convolution layer, and the normalized exponential function; a second generating unit configured to generate a second feature map based on the second output result, the third output result, and a bottleneck layer third convolution layer.

In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as in any one of the first aspects.

In a fourth aspect, some embodiments of the disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements a method as in any one of the first aspect.

One of the above-described various embodiments of the present disclosure has the following advantageous effects: and inputting the target characteristic diagram output by the bottleneck layer into the additional pooling network, and further obtaining a first output result under the condition of different pooling scales. Then, the first output result is input to the second convolution layer of the bottleneck layer to further extract the feature information of the target feature map. Further, a third output result is generated through the target feature map, the additional convolution layer and the normalized exponential function. And finally, generating a second characteristic diagram through the second output result, the third output result and the third convolution layer of the bottleneck layer. In the method for processing the feature map, an attention mechanism is added to improve the accuracy of extracting the feature information in the feature map by the network model, so that the learning and predicting capability of the network model is improved.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

FIG. 1 is a schematic illustration of one application scenario of a method for processing a feature map according to some embodiments of the present disclosure;

FIG. 2 is a flow diagram for one embodiment of a method of processing a feature map, according to an embodiment of the present disclosure;

FIG. 3 is a flow diagram of still further embodiments of methods for processing a feature map according to the present disclosure;

FIG. 4 is a process flow diagram of a bottleneck layer involved in a method for processing a feature map in accordance with some embodiments of the present disclosure;

FIG. 5 is a schematic block diagram of some embodiments of an apparatus for processing feature maps according to the present disclosure;

FIG. 6 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 1 is a schematic diagram 100 of one application scenario of a method for processing a feature map according to some embodiments of the present disclosure.

As shown in fig. 1, as an example, the electronic device 101 inputs the target feature map into the additional pooling network, resulting in a first output result. The target feature map is obtained by convolving the bottleneck layer first convolution layer 102 with a convolution kernel of 1 × 1. Here, the additional pooling network includes a pooling layer 1031 of scale (1, 1), a pooling layer 1032 of scale (2, 2), a pooling layer 1033 of scale (4, 4), and a pooling layer 1034 of scale (8, 8). The first output result described above is a set of outputs of 4 pooling layers. Then, the first output result is input to the bottleneck layer second convolution layer 104 with convolution kernel of 3 × 3, and a second output result is obtained. In addition, the target feature map 102 with convolution kernel 1 × 1 is input to the additional convolution layer 105 trained in advance, and the output result of the additional convolution layer 105 is obtained. Then, the output result of the additional convolution layer 105 is input to the normalized exponential function 106 to obtain a third output result. Finally, a second feature map 108 is generated from the second output result, the third output result, and the bottleneck layer third convolution layer 107 having a convolution kernel of 1 × 1.

It is to be understood that the method for processing the feature map may be performed by the electronic device 101 described above. The electronic device 101 may be hardware or software. When the electronic device 101 is hardware, it may be various electronic devices with information processing capabilities, including but not limited to smartphones, tablets, e-book readers, laptop portable computers, desktop computers, servers, and the like. When the electronic device 101 is software, it can be installed in the electronic devices listed above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

It should be understood that the number of electronic devices in fig. 1 is merely illustrative. There may be any number of electronic devices, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for processing a feature map in accordance with the present disclosure is shown. The method for processing the feature map comprises the following steps:

step 201, inputting the target feature map into the additional pooling network to obtain a first output result.

In some embodiments, an executing agent of the method for processing a feature map (e.g., the electronic device shown in fig. 1) inputs the target feature map into an additional pooling network, resulting in a first output result.

With further reference to fig. 4, the target profile can be the output of the bottleneck layer first convolution layer 402.

Therein, fig. 4 is a process flow diagram 400 of a bottleneck layer involved in a method for processing a feature map according to some embodiments of the present disclosure.

As shown in fig. 4, the initial feature map 401 is input to the bottleneck layer first convolution layer 402 with a pre-trained convolution kernel of 1 × 1, so as to obtain an output feature map of the bottleneck layer first convolution layer 402. The convolution is performed by using a convolution kernel 1 x 1 to reduce the feature dimension and reduce the number of parameters, thereby reducing the calculation amount. After dimension reduction, data training and feature extraction can be performed more effectively and intuitively. Then, the output feature map of the bottleneck layer first convolution layer 402 is input to the bottleneck layer second convolution layer 403 with a pre-trained convolution kernel of 3 × 3, so as to obtain the output feature map of the bottleneck layer second convolution layer 403. Further, the output feature map of the bottleneck layer second convolution layer 403 is input to the bottleneck layer third convolution layer 404 with a pre-trained convolution kernel of 1 × 1, so as to obtain the output feature map of the bottleneck layer third convolution layer 404. Convolution is performed by using a convolution kernel 1 x 1 to improve the feature dimension, so that a feature map identical to the feature dimension of the initial feature map 401 is obtained. Finally, the initial characteristic diagram 401 and the output characteristic diagram of the bottleneck layer third convolution layer 404 are added by corresponding pixels to obtain an output characteristic diagram 405.

The bottleneck layer first volume layer may be the bottleneck layer first volume layer 402 of fig. 4. The convolution kernel of the bottleneck layer first convolution layer 402 may be 1 × 1. Convolution with the convolution kernel 1 x 1 can reduce feature dimensions, reduce the number of parameters, and thus reduce the amount of computation. And moreover, the training of data and the feature extraction can be more effectively and intuitively carried out after dimension reduction. As an example, the target feature map may be processed through additional pooling networks of various configurations to obtain the first output result.

Alternatively, the additional pooling networks described above may be combined from multiple-scale pooling layers. The pooling layer compresses the input feature graph, so that the feature graph is reduced, and the network computation complexity is simplified; on one hand, feature compression is carried out, and key features are extracted. The first output result may be a first set of sub-feature maps. Each first sub-feature map in the first sub-feature map set corresponds to the output of a different pooling layer and represents different key feature information on the target feature map.

Step 202, inputting the first output result to the second convolution layer of the bottleneck layer to obtain a second output result.

In some embodiments, the execution body may input the first output result to a second convolution layer of the bottleneck layer to obtain a second output result. The convolution kernel of the second convolution layer of the bottleneck layer may be 3 × 3. And performing convolution by using a convolution layer with convolution kernel of 3-by-3 to further extract the feature information in each first sub-feature map in the first output result.

Optionally, as described in step 201, in the case that the first output result is the first sub-feature map set, each first sub-feature map in the first sub-feature map set may be input to the bottleneck layer second convolution layer to obtain the second sub-feature map. Thereby, a second set of sub-feature maps may be obtained as a second output result.

Step 203, generating a third output result based on the target feature map, the additional convolution layer and the normalized exponential function.

In some embodiments, a third output result is generated by the target feature map, the additional convolution layer, and the normalized exponential function described above. The additional convolution layer may be a convolution network added to the bottleneck layer, and is used to further extract the feature information of the target feature map. Normalized exponential function (softmax function), one K-dimensional vector z can be "compressed" into another K-dimensional real vector σ (z) with each element ranging between (0, 1) and the sum of all elements being 1.

Optionally, the third output result may be a set of weight values representing each second sub-feature map in the second output result. Here, the weight value in the set of weight values represents the importance degree of the second sub feature map relative to the second feature map.

And 204, generating a second feature map based on the second output result, the third output result and the third convolution layer of the bottleneck layer.

In some embodiments, the execution body may generate a second feature map according to the second output result, the third output result, and a third convolution layer of the bottleneck layer. Wherein the bottleneck layer third convolution layer is used for increasing the characteristic dimension. As an example, the second feature map may be obtained by the second output result, the third output result, and the bottleneck layer third convolution layer using various methods.

It can be seen from the above example that, on the basis of not increasing the amount of computation and the number of parameters, the attention mechanism network is added to the original bottleneck layer network structure, which not only improves the extraction capability of the improved network for the feature information in the target feature map, but also increases the accuracy of extracting the feature information in the target feature map.

FIG. 3 is a flow diagram of yet another embodiment of a method for processing a feature map in accordance with an embodiment of the present disclosure. The method for processing the feature map comprises the following steps:

the additional pooling network may include a predetermined number of different scales of pooling layers.

Step 301, inputting the target feature map into each pooling layer to generate a first sub-feature map, and obtaining a first sub-feature map set as a first output result.

In some embodiments, the execution subject inputs the target feature map into each pooling layer to generate a first sub-feature map, and obtains a first sub-feature map set as a first output result. As an example, the convolution kernel of the bottleneck layer first convolution layer may be 1 × 1, and the convolution results in the above target feature map. Then, the target feature maps after 1 × 1 convolution are respectively used for generating first sub-feature maps with different corresponding resolutions by using the pooling scales of (1, 1), (2, 2), (4, 4) and (8, 8), and a first sub-feature map set is obtained as a first output result.

Step 302, inputting each first sub-feature map in the first sub-feature map set to the second convolution layer of the bottleneck layer to generate a second sub-feature map, and obtaining a second sub-feature map set as a second output result.

In some embodiments, the execution body inputs each of the first sub-feature maps in the first sub-feature map set to the second convolution layer of the bottleneck layer to generate a second sub-feature map, and obtains a second sub-feature map set as a second output result. As an example, the 4 first sub-feature maps with different resolutions may be input into convolution layers with convolution kernels of 3 × 3, respectively, to obtain the second sub-feature map set. The 4 first sub-feature maps with different resolutions may be obtained by using the pooling scales of (1, 1), (2, 2), (4, 4), (8, 8).

Step 303, inputting the target feature map into the additional convolutional layer to obtain an output result of the additional convolutional layer.

The execution body inputs the target feature map into an additional convolutional layer to obtain an output result of the additional convolutional layer.

Step 304, inputting the output result of the additional convolution layer to the normalized exponential function to obtain the third output result.

In some embodiments, the execution body inputs the output result of the additional convolutional layer to the normalized exponential function to obtain the third output result. As an example, the output result of the additional convolutional layer is input to the normalized exponential function, resulting in a value between 0 and 1 for, for example, 4 characterization weights. Here, the values between 0 and 1 of the characterizing weights of the normalized exponential function output are the same as the number of pooling scales involved in the additional convolutional layers. Wherein each value represents the importance of the feature information in the corresponding second sub-feature map.

Step 305, generating a third feature map based on the second output result and the third output result.

In some embodiments, the third feature map is a feature map containing multi-field information. Wherein, the multi-sense information can be obtained by: and automatically screening key information from the second sub-feature map, and further combining the key information together to obtain multi-receptive-field information. Here, the target feature maps are pooled in different scales, and the feature map information learned after pooling is also different, so that the sizes of the corresponding receptive fields are also changed, thereby causing a multi-receptive field phenomenon.

In some optional implementations of some embodiments, the step of generating a third feature map based on the second output result and the third output result may include:

and step one, multiplying the second sub-feature map by the corresponding weight value of the second sub-feature map to generate a third sub-feature map, so as to obtain a third sub-feature map set.

And secondly, performing upsampling (upsampling) on the third sub-feature map to generate a fourth sub-feature map, so as to obtain a fourth sub-feature map set. The main purpose of upsampling is to enlarge the original image so that it can be displayed on a higher resolution display device. Here, the upsampling is used for the purpose of improving the resolution of the third sub-feature map.

And thirdly, adding the fourth sub-feature maps in the fourth sub-feature maps to generate the third feature map.

Step 306, inputting the third feature map into the third convolution layer of the bottleneck layer to obtain a second feature map.

In some embodiments, the execution body inputs the third feature map into a third convolution layer of the bottleneck layer to obtain a second feature map.

As can be seen in fig. 3, the flow 300 of the method for processing a feature map in some embodiments corresponding to fig. 3 further introduces the steps of the additional pooling network and attention mechanism described above, as compared to the description of some embodiments corresponding to fig. 2. The method for processing the feature map not only does not increase the calculation amount and the number of parameters, but also improves the accuracy of the improved network extraction feature map to a greater extent.

With further reference to fig. 5, as an implementation of the methods illustrated in the above figures, the present disclosure provides some embodiments of an apparatus for processing a feature map, which correspond to those method embodiments illustrated in fig. 2, which may be particularly applicable in various electronic devices.

As shown in fig. 5, the apparatus 500 for processing a feature map of some embodiments includes: a first input unit 501, a second input unit 502, a first generation unit 503, and a second generation unit 504. The first input unit 501 is configured to input a target feature map into the additional pooling network to obtain a first output result, where the target feature map is an output of the bottleneck layer first convolution layer. The second input unit 502 is configured to input the first output result to the bottleneck layer and the second convolution layer to obtain a second output result. A first generating unit 503 configured to generate a third output result based on the target feature map, the additional convolution layer, and the normalized exponential function; a second generating unit 504 configured to generate a second feature map based on the second output result, the third output result, and a bottleneck layer third convolution layer.

In some optional implementations of some embodiments, the additional pooling network includes a predetermined number of different scales of pooling layers. The first input unit 501 may be further configured to: the above inputting the target feature map into the additional pooling network to obtain a first output result includes: and inputting the target feature map into each pooling layer to generate a first sub-feature map, and obtaining a first sub-feature map set as a first output result.

In some optional implementations of some embodiments, the second input unit 502 may be further configured to: and inputting each first sub-feature map in the first sub-feature map set into the second convolution layer of the bottleneck layer to generate a second sub-feature map, and obtaining a second sub-feature map set as a second output result.

In some optional implementations of some embodiments, the first generating unit 503 may be further configured to: inputting the target characteristic diagram into an additional convolution layer to obtain an output result of the additional convolution layer; and inputting the output result of the additional convolution layer to the normalized exponential function to obtain the third output result.

In some optional implementations of some embodiments, the second generating unit 504 may be further configured to: multiplying the second sub-feature graph by the corresponding weight value of the second sub-feature graph to generate a third sub-feature graph, and obtaining a third sub-feature graph set; performing upsampling on the third sub-feature map to generate a fourth sub-feature map, so as to obtain a fourth sub-feature map set; and adding the fourth sub-feature maps to generate the third feature map.

It will be understood that the elements described in the apparatus 500 correspond to various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 500 and the units included therein, and are not described herein again.

Referring now to FIG. 6, a block diagram of an electronic device (e.g., the electronic device of FIG. 1) 600 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described above in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the apparatus; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: inputting a target characteristic diagram into an additional pooling network to obtain a first output result, wherein the target characteristic diagram is output of a first convolution layer of a bottleneck layer; inputting the first output result into a second convolution layer of the bottleneck layer to obtain a second output result; generating a third output result based on the target characteristic diagram, the additional convolution layer and the normalized exponential function; and generating a second feature map based on the second output result, the third output result and the third convolution layer of the bottleneck layer.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first input unit, a second input unit, a first generation unit, and a second generation unit. Where the names of these units do not in some cases constitute a limitation of the unit itself, for example, the first input unit may also be described as "inputting a target feature map, which is a unit of output of the bottleneck layer first convolution layer, to the additional pooling network, resulting in the first output result".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In accordance with one or more embodiments of the present disclosure, there is provided a method for processing a feature map, including: inputting a target characteristic diagram into an additional pooling network to obtain a first output result, wherein the target characteristic diagram is output of a first convolution layer of a bottleneck layer; inputting the first output result into a second convolution layer of the bottleneck layer to obtain a second output result; generating a third output result based on the target characteristic diagram, the additional convolution layer and the normalized exponential function; and generating a second feature map based on the second output result, the third output result and the third convolution layer of the bottleneck layer.

According to one or more embodiments of the present disclosure, the additional pooling network includes a predetermined number of different scales of pooling layers; and inputting the target feature map into an additional pooling network to obtain a first output result, wherein the target feature map is an output of the first convolution layer of the bottleneck layer, and comprises: and inputting the target feature map into each pooling layer to generate a first sub-feature map, and obtaining a first sub-feature map set as a first output result.

According to one or more embodiments of the present disclosure, the inputting the first output result to the second convolution layer of the bottleneck layer to obtain a second output result includes: and inputting each first sub-feature map in the first sub-feature map set into the second convolution layer of the bottleneck layer to generate a second sub-feature map, and obtaining a second sub-feature map set as a second output result.

According to one or more embodiments of the present disclosure, the generating a third output result based on the target feature map, the additional convolution layer and the normalized exponential function includes: inputting the target characteristic diagram into an additional convolution layer to obtain an output result of the additional convolution layer; and inputting the output result of the additional convolution layer to the normalized exponential function to obtain the third output result.

According to one or more embodiments of the present disclosure, the generating a second feature map based on the second output result, the third output result, and a third convolution layer of the bottleneck layer includes: generating a third feature map based on the second output result and the third output result; and inputting the third feature map into a third convolution layer of the bottleneck layer to obtain a second feature map, wherein the resolution of the second feature map is the same as that of the target feature map.

According to one or more embodiments of the present disclosure, the generating a third feature map based on the second output result and the third output result includes: multiplying the second sub-feature graph by the corresponding weight value of the second sub-feature graph to generate a third sub-feature graph, and obtaining a third sub-feature graph set; performing upsampling on the third sub-feature map to generate a fourth sub-feature map, so as to obtain a fourth sub-feature map set; and adding the fourth sub-feature maps to generate the third feature map.

According to one or more embodiments of the present disclosure, there is provided an apparatus for processing a feature map, including: a first input unit configured to input a target feature map into the additional pooling network to obtain a first output result, wherein the target feature map is an output of the bottleneck layer first convolution layer; a second input unit configured to input the first output result to a second convolution layer of the bottleneck layer to obtain a second output result; a first generating unit configured to generate a third output result based on the target feature map, the additional convolution layer, and the normalized exponential function; a second generating unit configured to generate a second feature map based on the second output result, the third output result, and a bottleneck layer third convolution layer.

In accordance with one or more embodiments of the present disclosure, the additional pooling network includes a predetermined number of different scales of pooling layers. The first input unit may be further configured to: the above inputting the target feature map into the additional pooling network to obtain a first output result includes: and inputting the target feature map into each pooling layer to generate a first sub-feature map, and obtaining a first sub-feature map set as a first output result.

According to one or more embodiments of the present disclosure, the second input unit may be further configured to: and inputting each first sub-feature map in the first sub-feature map set into the second convolution layer of the bottleneck layer to generate a second sub-feature map, and obtaining a second sub-feature map set as a second output result.

According to one or more embodiments of the present disclosure, the first generating unit may be further configured to: inputting the target characteristic diagram into an additional convolution layer to obtain an output result of the additional convolution layer; and inputting the output result of the additional convolution layer to the normalized exponential function to obtain the third output result.

According to one or more embodiments of the present disclosure, the second generating unit may be further configured to: multiplying the second sub-feature graph by the corresponding weight value of the second sub-feature graph to generate a third sub-feature graph, and obtaining a third sub-feature graph set; performing upsampling on the third sub-feature map to generate a fourth sub-feature map, so as to obtain a fourth sub-feature map set; and adding the fourth sub-feature maps to generate the third feature map.

According to one or more embodiments of the present disclosure, there is provided an electronic device including: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as described in any of the embodiments above.

According to one or more embodiments of the present disclosure, a computer-readable medium is provided, on which a computer program is stored, wherein the program, when executed by a processor, implements the method as described in any of the embodiments above.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A method for processing a feature map, comprising:

inputting a target characteristic diagram into an additional pooling network to obtain a first output result, wherein the target characteristic diagram is output of a first convolution layer of a bottleneck layer;

inputting the first output result to a second convolution layer of the bottleneck layer to obtain a second output result;

generating a third output result based on the target feature map, the additional convolution layer and the normalized exponential function;

and generating a second feature map based on the second output result, the third output result and a third convolution layer of the bottleneck layer.

2. The method of claim 1, wherein the additional pooling network comprises a predetermined number of different scale pooling layers; and

inputting the target feature map into the additional pooling network to obtain a first output result, including:

and inputting the target feature map into each pooling layer to generate a first sub-feature map, and obtaining a first sub-feature map set as a first output result.

3. The method of claim 2, wherein said inputting the first output result to a bottleneck layer second convolutional layer to obtain a second output result comprises:

and inputting each first sub-feature map in the first sub-feature map set into the second convolution layer of the bottleneck layer to generate a second sub-feature map, and obtaining a second sub-feature map set as a second output result.

4. The method of claim 1, wherein the generating a third output result based on the target feature map, additional convolution layers, and a normalized exponential function comprises:

inputting the target characteristic diagram into an additional convolution layer to obtain an output result of the additional convolution layer;

and inputting the output result of the additional convolution layer to the normalized exponential function to obtain the third output result.

5. The method of claim 3, wherein the generating a second feature map based on the second output result, the third output result, and a bottleneck layer, a third convolutional layer comprises:

generating a third feature map based on the second output result and the third output result;

and inputting the third feature map into a third convolution layer of the bottleneck layer to obtain a second feature map, wherein the resolution of the second feature map is the same as that of the target feature map.

6. The method of claim 5, wherein the generating a third feature map based on the second output result and the third output result comprises:

multiplying the second sub-feature graph by the corresponding weight value of the second sub-feature graph to generate a third sub-feature graph, and obtaining a third sub-feature graph set;

performing upsampling on the third sub-feature map to generate a fourth sub-feature map, so as to obtain a fourth sub-feature map set;

and adding the fourth sub-feature maps in the fourth sub-feature maps to generate the third feature map.

7. An apparatus for processing a feature map, comprising:

a first input unit configured to input a target feature map into an additional pooling network, resulting in a first output result, wherein the target feature map is an output of a bottleneck layer first convolution layer;

a second input unit configured to input the first output result to a bottleneck layer second convolution layer to obtain a second output result;

a first generation unit configured to generate a third output result based on the target feature map, an additional convolution layer, and a normalized exponential function;

a second generating unit configured to generate a second feature map based on the second output result, the third output result, and a bottleneck layer third convolution layer.

8. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-6.

9. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.