CN110334716B

CN110334716B - Feature map processing method, image processing method and device

Info

Publication number: CN110334716B
Application number: CN201910600515.8A
Authority: CN
Inventors: 马宁宁
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-07-04
Filing date: 2019-07-04
Publication date: 2022-01-11
Anticipated expiration: 2039-07-04
Also published as: CN110334716A

Abstract

The disclosure provides a feature map processing method, an image processing method and an image processing device. The feature extraction method comprises the following steps: a step of obtaining a characteristic diagram, namely obtaining a secondary characteristic diagram; a parameter generation step, namely obtaining individual parameters through pooling based on the secondary feature map; a parameter adjusting step, namely performing point convolution on the individual parameters to obtain individual guide parameters; a weight adjusting step, adjusting the weight according to the individual guide parameters to obtain the individual weight; and a feature extraction step, namely obtaining a high-level feature map based on the secondary feature map and the individual weight. According to the feature extraction method, different individual weights are generated according to different input feature graphs, and each feature is subjected to feature extraction according to the individual weight generated according to the respective feature, so that the accuracy of the result is improved.

Description

Feature map processing method, image processing method and device

Technical Field

The present invention generally relates to the field of image recognition, and in particular, to a feature map processing method, an image processing method, and an image processing apparatus.

Background

With the development of computer technology, more and more scenes need to perform image processing work such as target detection, target recognition and the like through the computer technology. Wherein a Convolutional Neural Network (CNN) model is the core of modern deep vision recognition systems. Convolutional neural networks are typically of the form Y ═ conv (X, W), where X is the input feature, Y is the output feature, and W is the weight. The value of the weight is updated through algorithms such as neural network back propagation, gradient updating and the like, and the training of the neural network is also the updating of the weight.

By completing the training of a neural network, all data share the weight of the trained network, namely all data share the set of weight; however, although all data have commonality and can share a weight, the data also have respective characteristics, and the use of only one weight can cause inaccuracy in the output characteristics.

Disclosure of Invention

In order to solve the above problems in the prior art, a feature diagram processing method of a first aspect of the present invention includes: a step of obtaining a characteristic diagram, namely obtaining a secondary characteristic diagram; a parameter generation step, namely obtaining individual parameters through pooling based on the secondary feature map; a parameter adjusting step, namely performing point convolution on the individual parameters to obtain individual guide parameters; a weight adjusting step, adjusting the weight according to the individual guide parameters to obtain the individual weight; and a feature extraction step, namely obtaining a high-level feature map based on the secondary feature map and the individual weight.

In one example, personality parameters include: convolution kernel parameters and/or channel parameters.

In one example, the convolution kernel parameters are obtained by performing the pooling on the secondary feature map.

In one example, the channel parameters are obtained by pooling convolution kernel parameters.

In one example, the channel parameters are obtained by pooling the secondary profiles.

In one example, pooling is an average pooling.

In one example, the parameter adjusting step includes: and performing multilayer point convolution on the individual parameters to obtain individual guide parameters.

In one example, the parameter adjusting step further includes: after point convolution is carried out on the individual parameters, nonlinear transformation is carried out through an activation function, and individual guide parameters are obtained.

A second aspect of the present invention provides an image processing method comprising: an image acquisition step of acquiring an image; a feature extraction step of extracting image features of the image according to the feature map processing method as the first aspect; and an image recognition step, which is used for recognizing the image according to the image characteristics.

A third aspect of the present invention provides a feature map processing apparatus including: the characteristic diagram obtaining module is used for obtaining a secondary characteristic diagram; the parameter generation module is used for obtaining personalized parameters through pooling based on the secondary feature map; the parameter adjusting module is used for performing point convolution on the individual parameters to obtain individual guide parameters; the weight adjusting module is used for adjusting the weight according to the individual guide parameters to obtain the individual weight; and the feature extraction module is used for obtaining a high-level feature map based on the secondary feature map and the individual weight.

A fourth aspect of the present invention provides an image processing apparatus comprising: the image acquisition module is used for acquiring an image; a feature extraction module for extracting image features of the image according to the feature map processing method of the first aspect; and the image recognition module is used for carrying out image recognition according to the image characteristics.

A fifth aspect of the present invention provides an electronic apparatus comprising: a memory to store instructions; and a processor for calling the instructions stored in the memory to execute the feature map processing method of the first aspect or the image processing method of the second aspect.

A sixth aspect of the present invention provides a computer-readable storage medium having stored therein instructions which, when executed by a processor, perform a feature map processing method as in the first aspect or an image processing method as in the second aspect.

According to the feature map processing method, the image processing method and the device, different individual weights are generated by aiming at different input feature maps in the convolutional neural network convolutional layer, and each feature is subjected to feature extraction by the individual weight generated according to the respective feature, so that the accuracy of the result is improved.

Drawings

The above and other objects, features and advantages of embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

FIG. 1 shows a schematic flow diagram of a feature map processing method according to an embodiment of the invention;

FIG. 2 shows a schematic diagram of a feature map processing apparatus according to an embodiment of the invention;

fig. 3 is a schematic diagram of an electronic device according to an embodiment of the present invention.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way.

It should be noted that although the expressions "first", "second", etc. are used herein to describe different modules, steps, data, etc. of the embodiments of the present invention, the expressions "first", "second", etc. are merely used to distinguish between different modules, steps, data, etc. and do not indicate a particular order or degree of importance. Indeed, the terms "first," "second," and the like are fully interchangeable.

With the development of computer technology and the wide application of computer vision principle, the computer image processing technology is utilized to detect, track and identify targets in images more and more popular, and the targets are dynamically tracked and positioned in real time, so that the method has wide application value in the aspects of intelligent traffic systems, intelligent monitoring systems, military target detection, positioning of surgical instruments in medical navigation operations and the like; and the identification of the target plays a very important role in the safety guarantee fields of public security, counter terrorism and the like. The convolutional neural network is the core of the above technology, and the operating efficiency of the feature extraction largely determines the efficiency of the image processing work.

In order to improve the accuracy of the output result of the convolutional neural network, fig. 1 shows a feature map processing method 100 provided by an embodiment of the present invention, which includes: a step 110 of obtaining a feature map, a step 120 of generating parameters, a step 130 of adjusting parameters, a step 140 of adjusting weights, and a step 150 of extracting features. The signature graph processing method 100 may be applied to a certain convolutional layer of a convolutional neural network, and may also be applied to multiple layers or all convolutional layers. The above steps are explained in detail below.

A get feature map step 110, get a secondary feature map.

The parameters of the secondary feature map may include the number of feature map batches, the number of feature map channels, the height of the feature map, and the width of the feature map.

The acquired secondary feature map may be an original map, or a feature map subjected to other convolutional layer operations. The profile input features are identified here as X, which may have a four-dimensional tensor (tensor) with dimensions (n, c, h, w), where n is the number of batches, c is the number of channels, h is the profile height, and w is the profile width.

In one example, when applying a neural network model for feature extraction, only one picture is fed into the model at a time, so n is 1; in another example, when training the neural network model, n is the number of input feature maps and is a positive integer greater than or equal to 1.

And a parameter generation step 120 of obtaining the personality parameters through pooling based on the secondary feature map.

And obtaining the individual parameters with the characteristics of the feature map by pooling the feature map. In one example, Pooling is Average Pooling (Average Pooling).

In one example, personality parameters may include: convolution kernel parameters and/or channel parameters.

In one example, the convolution kernel parameters may be obtained by pooling the secondary feature map directly, for example, by setting the sliding window size (window size) to h w and the edge padding size (padding) to (k-1)/2, where k is the convolution kernel size, typically 3, and pooling to generate the convolution kernel parameters with tensors of (n, c, k, k).

In another example, the channel parameters may be obtained by pooling the convolution kernel parameters. And further pooling the obtained convolution kernel parameters (n, c, k, k) to obtain channel parameters, namely, further setting the size of the sliding window to k × k and the edge filling size to 0 through a pooling layer on the basis of obtaining the convolution kernel parameters (n, c, k, k), so as to obtain the channel parameters, wherein the tensor of the channel parameters is (n, c, 1, 1).

In yet another example, the channel parameters may also be obtained by pooling the secondary feature maps. For example, setting the sliding window size (window size) to h w and the edge fill size to 0 directly yields the channel parameters, whose tensor is (n, c, 1, 1).

And a parameter adjusting step 130, performing point convolution on the personality parameters to obtain personality guide parameters.

In one example, the personality parameters, that is, the convolution kernel parameters and/or the channel parameters, may be subjected to multi-layer 1 × 1 point convolution to obtain personality guidance parameters. Through multilayer point convolution, the obtained individual guide parameters can have more characteristics.

In one example, after point convolution, the personality guidance parameter is obtained by performing nonlinear transformation on the activation function. Wherein the activation function may be a sigmoid nonlinear function.

By the method, the convolution kernel parameters and/or the channel parameters can be adjusted to obtain the convolution kernel guide parameters and/or the channel guide parameters, and different dimensions of the weight can be guided respectively. Wherein, the obtained tensor of the convolution kernel guiding parameter can be (n, c, k, k), and the tensor of the channel guiding parameter can be (n, 0, 1, 1).

And a weight adjusting step 140, adjusting the weight according to the individual guide parameters to obtain the individual weight.

And adjusting the originally shared weight through the obtained individual guide parameters, thereby obtaining the individual weight aiming at different input feature maps.

In one example, the individual guidance parameters include a convolution kernel guidance parameter and/or a channel guidance parameter, and the corresponding dimensions of the weights are adjusted respectively, specifically, the convolution kernel guidance parameter adjusts the convolution kernel dimensions of the weights, and the channel guidance parameter adjusts the channel dimensions of the weights. The adjustment may be performed by multiplication after conversion (reshape), for example, the tensor of the original weight W is (0, c, k, k), the original weight W is converted into (1, 0, c, k, k), the tensor (n, c, k, k) of the convolution kernel guidance parameter is converted into (n, 1, c, k, k), the tensor (n, 0, 1, 1) of the channel guidance parameter is converted into (n, 0, 1, 1, 1), and the adjusted personality weight is obtained by multiplication of the five-dimensional tensor, and the tensor is (n, 0, c, k, k).

And a feature extraction step 150, wherein a high-level feature map is obtained based on the secondary feature map and the individual weight.

And extracting to obtain a high-level feature map based on the input secondary feature map and the corresponding personality weight generated according to the secondary feature map. For example, the tensor (n, c, h, w) of the input domain feature X of the secondary feature map is converted into (1, n × c, h, w), and then the group convolution (group wise conv) is performed by the personality weight (n, 0, c, k, k), so that the result of the tensor (1, n, 0, h, w) is obtained, and the output feature Y of the tensor (n, 0, h, w) is obtained by the conversion.

According to any one of the embodiments provided by the disclosure, the guide parameters are generated according to the respective characteristics of each feature map, and then the weight is adjusted through the guide parameters, so that the weight is added to the respective characteristics of each feature map, and when feature extraction is performed on the basis, the accuracy of feature extraction can be improved.

An embodiment of the present invention further provides an image processing method, including: an image acquisition step of acquiring an image; a feature extraction step of extracting image features of an image according to the feature map processing method 100 of any one of the embodiments; and an image recognition step, which is used for recognizing the image according to the image characteristics. In the feature extraction step, the feature map processing method 100 of any one of the embodiments may be applied to a certain convolution layer, or the feature map processing method 100 of any one of the embodiments may be applied to a plurality of or all convolution layers. When each convolution layer is applied, parameters of the convolution layer can be adjusted in a personalized mode according to the secondary feature diagram input by the current layer, and therefore the high-level feature diagram output by the current layer can be more accurate.

Fig. 2 shows a feature map processing apparatus 200 according to an embodiment of the present invention, and as shown in fig. 2, the feature map processing apparatus 200 includes: an obtain feature map module 210 for obtaining a feature map; a parameter generating module 220, configured to obtain, based on the feature map, a personalized parameter through pooling; a parameter adjusting module 230, configured to perform point convolution on the personality parameters to obtain personality guidance parameters; the weight adjusting module 240 is configured to adjust the weight according to the individual guidance parameter to obtain an individual weight; and the feature extraction module 250 is configured to obtain features of the feature map based on the feature map and the individual weight.

In one example, the channel parameters are obtained by pooling feature maps or by pooling convolution kernel parameters.

In one example, pooling is an average pooling.

In one example, the parameter adjustment module 230 is configured to: and performing multilayer point convolution on the individual parameters to obtain individual guide parameters.

In one example, the parameter adjustment module 230 is further configured to: after point convolution is carried out on the individual parameters, nonlinear transformation is carried out through an activation function, and individual guide parameters are obtained.

An embodiment of the present invention further provides an image processing apparatus, including: the image acquisition module is used for acquiring an image; a feature extraction module, configured to extract image features of an image according to the feature map processing method according to any one of the embodiments; and the image recognition module is used for carrying out image recognition according to the image characteristics.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

As shown in fig. 3, one embodiment of the invention provides an electronic device 300. The electronic device 300 includes a memory 301, a processor 302, and an Input/Output (I/O) interface 303. The memory 301 is used for storing instructions. And the processor 302 is used for calling the instructions stored in the memory 301 to execute the feature diagram processing method of the embodiment of the invention. The processor 302 is connected to the memory 301 and the I/O interface 303, respectively, for example, via a bus system and/or other connection mechanism (not shown). The memory 301 may be used to store programs and data, including programs of the feature map processing method involved in the embodiments of the present invention, and the processor 302 executes various functional applications of the electronic device 300 and data processing by executing the programs stored in the memory 301.

In an embodiment of the present invention, the processor 302 may be implemented in at least one hardware form of a Digital Signal Processor (DSP), a Field-Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), and the processor 302 may be one or a combination of several Central Processing Units (CPUs) or other forms of Processing units with data Processing capability and/or instruction execution capability.

Memory 301 in embodiments of the present invention may comprise one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile Memory may include, for example, a Random Access Memory (RAM), a cache Memory (cache), and/or the like. The nonvolatile Memory may include, for example, a Read-Only Memory (ROM), a Flash Memory (Flash Memory), a Hard Disk Drive (HDD), a Solid-State Drive (SSD), or the like.

In the embodiment of the present invention, the I/O interface 303 may be used to receive input instructions (e.g., numeric or character information, and generate key signal inputs related to user settings and function control of the electronic device 300, etc.), and may also output various information (e.g., images or sounds, etc.) to the outside. The I/O interface 303 may comprise one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a mouse, a joystick, a trackball, a microphone, a speaker, a touch panel, and the like.

It is to be understood that while operations are depicted in the drawings in a particular order, this is not to be understood as requiring that such operations be performed in the particular order shown or in serial order, or that all illustrated operations be performed, to achieve desirable results. In certain environments, multitasking and parallel processing may be advantageous.

The methods and apparatus of embodiments of the present invention can be accomplished using standard programming techniques with rule-based logic or other logic to accomplish the various method steps. It should also be noted that the words "means" and "module," as used herein and in the claims, is intended to encompass implementations using one or more lines of software code, and/or hardware implementations, and/or equipment for receiving inputs.

Any of the steps, operations, or procedures described herein may be performed or implemented using one or more hardware or software modules, alone or in combination with other devices. In one embodiment, the software modules are implemented using a computer program product comprising a computer readable medium containing computer program code, which is executable by a computer processor for performing any or all of the described steps, operations, or procedures.

The foregoing description of the implementation of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. The embodiments were chosen and described in order to explain the principles of the invention and its practical application to enable one skilled in the art to utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated.

Claims

1. A feature map processing method, wherein the feature map processing method is used for image processing, the method comprising:

a step of obtaining a characteristic diagram, namely obtaining a secondary characteristic diagram;

a parameter generation step, wherein personalized parameters are obtained through pooling based on the secondary feature map, wherein the personalized parameters comprise: convolution kernel parameters and/or channel parameters;

a parameter adjusting step, namely performing point convolution on the personality parameters and performing nonlinear transformation on the personality parameters through an activation function to obtain personality guide parameters; wherein the personality guidance parameters include: a convolution kernel guidance parameter and/or a channel guidance parameter;

a weight adjusting step, adjusting the weight according to the individual guiding parameter to obtain an individual weight, wherein the convolution kernel guiding parameter adjusts the convolution kernel dimension of the weight, the channel guiding parameter adjusts the channel dimension of the weight, and the adjustment is performed in a multiplication mode after transformation;

a feature extraction step, namely obtaining a high-level feature map based on the secondary feature map and the individual weight;

the parameter adjusting step comprises: and obtaining the individual guide parameter by carrying out multilayer point convolution on the individual parameter.

2. The method of claim 1, wherein the convolution kernel parameters are obtained by performing the pooling on the secondary feature maps.

3. The method of claim 2, wherein the channel parameters are obtained by the pooling of the convolution kernel parameters.

4. The method of claim 1, wherein the channel parameters are obtained by the pooling of the secondary feature maps.

5. The method of any of claims 1-4, wherein the pooling is an average pooling.

6. An image processing method comprising:

an image acquisition step of acquiring an image;

a feature extraction step of extracting an image feature of the image according to the feature map processing method of any one of claims 1 to 5;

and an image recognition step, which is used for recognizing the image according to the image characteristics.

7. A feature map processing apparatus, wherein the apparatus comprises:

the characteristic diagram obtaining module is used for obtaining a secondary characteristic diagram;

a parameter generating module, configured to obtain, based on the secondary feature map, a personality parameter through pooling, where the personality parameter includes: convolution kernel parameters and/or channel parameters;

the parameter adjusting module is used for performing point convolution on the personality parameters and performing nonlinear transformation on the personality parameters through an activation function to obtain personality guide parameters; wherein the personality guidance parameters include: a convolution kernel guidance parameter and/or a channel guidance parameter;

the weight adjusting module is used for adjusting the weight according to the individual guide parameter to obtain the individual weight, wherein the convolution kernel guide parameter adjusts the convolution kernel dimension of the weight, the channel guide parameter adjusts the channel dimension of the weight, and the adjustment is performed in a multiplication mode after transformation;

the characteristic extraction module is used for extracting the characteristic of the character string according to the characteristic weight of the character string;

the parameter adjusting module is further configured to perform the multilayer point convolution on the personality parameter to obtain the personality guidance parameter.

8. An image processing apparatus, wherein the image processing apparatus comprises:

the image acquisition module is used for acquiring an image;

a feature extraction module, configured to extract image features of the image according to the feature map processing method of any one of claims 1 to 5;

and the image recognition module is used for carrying out image recognition according to the image characteristics.

9. An electronic device, wherein the electronic device comprises:

a memory to store instructions; and

a processor for invoking the memory-stored instructions to perform the feature map processing method of any of claims 1 to 5 or the image processing method of claim 6.

10. A computer-readable storage medium having stored therein instructions which, when executed by a processor, perform the feature map processing method of any one of claims 1 to 5 or the image processing method of claim 6.