CN113255700A

CN113255700A - Image feature map processing method and device, storage medium and terminal

Info

Publication number: CN113255700A
Application number: CN202110645169.2A
Authority: CN
Inventors: 李明蹊
Original assignee: Spreadtrum Communications Shanghai Co Ltd
Current assignee: Spreadtrum Communications Shanghai Co Ltd
Priority date: 2021-06-10
Filing date: 2021-06-10
Publication date: 2021-08-13
Anticipated expiration: 2041-06-10
Also published as: WO2022257433A1; CN113255700B

Abstract

A processing method and device, a storage medium and a terminal for a feature map of an image are provided, wherein the method comprises the following steps: acquiring an initial characteristic map of an image; carrying out coordinate coding on each pixel point in the initial characteristic diagram to obtain a coordinate value of each pixel point in the initial characteristic diagram; and updating the initial characteristic diagram according to the coordinate value of each pixel point in the initial characteristic diagram to obtain an updated characteristic diagram, wherein the updated characteristic diagram comprises the coordinate value of each pixel point. Through the scheme of the invention, the information in the characteristic diagram can be enriched.

Description

Image feature map processing method and device, storage medium and terminal

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for processing a feature map of an image, a storage medium, and a terminal.

Background

In recent years, Neural network (Neural Networks) technology has found widespread use in processing Computer Vision (Computer Vision) tasks. For example, trained neural networks are used for image classification, target detection, face key point detection and the like. When a computer vision task is processed by using a neural network technology, it is generally required to extract a Feature Map (Feature Map) of an image, and then perform detection or classification, etc. based on the extracted Feature Map. It can be understood that the richer and more comprehensive the information in the feature map, the more accurate the results obtained by the neural network.

However, the information in the feature map extracted in the prior art is relatively single, and usually only includes the pixel values of the pixel points in the image, the information in the feature map is very limited, and the accuracy of the neural network in processing the computer vision task still needs to be improved.

Therefore, a method for processing a feature map of an image is needed, which can enrich information in the feature map to improve the accuracy of a neural network in processing a computer vision task.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a processing method of a feature map of an image, which can enrich information in the feature map, thereby improving the accuracy of processing computer vision tasks.

In order to solve the above technical problem, an embodiment of the present invention provides a method for processing a feature map of an image, where the method includes: acquiring an initial characteristic map of an image; carrying out coordinate coding on each pixel point in the initial characteristic diagram to obtain a coordinate value of each pixel point in the initial characteristic diagram; and updating the initial characteristic diagram according to the coordinate value of each pixel point in the initial characteristic diagram to obtain an updated characteristic diagram, wherein the updated characteristic diagram comprises the coordinate value of each pixel point.

Optionally, the coordinate values include a row coordinate value and a column coordinate value, where the row coordinate value is a coordinate value of the pixel point in a row direction, the column coordinate value is a coordinate value of the pixel point in a column direction, and for a pixel point in an ith row and a jth column, the row coordinate value of the pixel point is i, and the column coordinate value is j; wherein i and j are positive integers, i is greater than or equal to 1 and is less than or equal to W, j is greater than or equal to 1 and is less than or equal to H, W, H is a positive integer, W is the number of pixel points in the row direction in the initial feature map, and H is the number of pixel points in the column direction in the initial feature map.

Optionally, the coordinate values include a row coordinate value and a column coordinate value, the row coordinate value is a coordinate value of the pixel point in a row direction, the column coordinate value is a coordinate value of the pixel point in a column direction, for the pixel point in the ith row and the jth column, the row coordinate value of the pixel point is (i-1)/(W-1), and the column coordinate value is (j-1)/(H-1); wherein i and j are natural numbers, i is greater than or equal to 0 and less than or equal to W-1, j is greater than or equal to 0 and less than or equal to H-1, W, H is a positive integer greater than 1, W is the number of pixels in the row direction in the initial feature map, and H is the number of pixels in the column direction in the initial feature map.

Optionally, updating the initial feature map according to the coordinate value of each pixel point in the initial feature map includes: generating a coordinate feature map according to the coordinate values of all the pixel points, wherein the value of each pixel point in the coordinate feature map is determined according to the coordinate value of the pixel point; and carrying out feature fusion processing on the coordinate feature map and the initial feature map to obtain the updated feature map.

Optionally, the coordinate feature graph includes a first coordinate feature sub-graph and a second coordinate feature sub-graph, where a value of each pixel in the first coordinate feature sub-graph is a row coordinate value of the pixel, and a value of each pixel in the second coordinate feature sub-graph is a column coordinate value of the pixel.

Optionally, the performing feature fusion processing on the coordinate feature map and the initial feature map includes: and splicing the coordinate feature map and the initial feature map in a channel direction to obtain the updated feature map, wherein the number of channels of the updated feature map is greater than that of the initial feature map.

Optionally, the method further includes: and processing the updated feature map based on an attention mechanism to obtain a processed feature map.

Optionally, before processing the updated feature map based on the attention mechanism, the method further includes: and performing convolution operation on the updated feature map by adopting a plurality of convolution cores so as to enable the number of channels of the updated feature map to be the same as that of the initial feature map.

Optionally, processing the updated feature map based on the attention mechanism includes: performing attention extraction on the updated feature map based on the attention mechanism to obtain an attention map; and performing feature fusion processing on the initial feature map and the attention map to obtain the processed feature map.

Optionally, the pixel points in the initial feature map and the pixel points in the attention map are in one-to-one correspondence, and performing feature fusion processing on the initial feature map and the attention map includes: and for each pixel point, calculating the sum of the value of the pixel point in the attention diagram and the value in the initial feature diagram, and taking the sum as the value of the pixel point in the processed feature diagram.

Optionally, processing the updated feature map based on the attention mechanism includes: performing attention extraction on the updated feature map based on the attention mechanism to obtain an attention map; performing convolution operation on the attention diagram by adopting a plurality of convolution cores to obtain a transformed attention diagram; performing feature fusion processing on the initial feature map and the transformed attention map to obtain the processed feature map; and the number of the convolution kernels is the same as the number of channels of the initial feature map.

In order to solve the above technical problem, an embodiment of the present invention further provides an apparatus for processing a feature map of an image, where the apparatus includes: the acquisition module is used for acquiring an initial characteristic map of the image; the coordinate coding module is used for carrying out coordinate coding on each pixel point in the initial characteristic diagram so as to obtain a coordinate value of each pixel point in the initial characteristic diagram; and the characteristic updating module is used for updating the initial characteristic diagram according to the coordinate value of each pixel point in the initial characteristic diagram so as to obtain an updated characteristic diagram, and the updated characteristic diagram comprises the coordinate value of each pixel point.

Embodiments of the present invention further provide a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the method for processing the feature map of the image.

The embodiment of the present invention further provides a terminal, which includes a memory and a processor, where the memory stores a computer program that can be executed on the processor, and the processor executes the steps of the processing method for the feature map of the image when executing the computer program.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

in the scheme of the embodiment of the invention, after the initial characteristic diagram of the image is obtained, the coordinate coding is carried out on each pixel point in the initial characteristic diagram, so that the coordinate value of each pixel point in the initial characteristic diagram is obtained, and then the initial characteristic diagram is updated according to the coordinate value of each pixel point. The coordinate values of the pixels are obtained by carrying out coordinate coding on the pixels of the initial characteristic diagram, so that the coordinate values of the pixels contain coordinate information of all the pixels in the initial characteristic diagram. Because the updated feature map is obtained by updating the initial feature map according to the coordinate value of each pixel point in the initial feature map, compared with the initial feature map, the updated feature map contains the coordinate information of each pixel point, and the accuracy can be effectively improved when the computer vision task is executed based on the updated feature map.

Further, in the solution of the embodiment of the present invention, for the pixel point in the ith row and the jth column, the row coordinate value of the pixel point is (i-1)/(W-1), and the column coordinate value is (j-1)/(H-1), that is, the embodiment of the present invention performs normalization processing on the coordinate values of each pixel point, and updates the initial feature map by using the coordinate values after the normalization processing, so that the coordinate information in the updated feature map can be more optimized, thereby further improving the accuracy of executing the computer vision task.

Further, in the solution of the embodiment of the present invention, the updated feature map is processed based on the attention mechanism to obtain a processed feature map. Because the updated feature map contains the coordinate values of the pixel points, processing the updated feature map by adopting the attention mechanism can enhance the coordinate information of the pixel points in the updated feature map, namely, compared with the updated feature map, the coordinate information in the processed feature map is more prominent, and the accuracy can be further improved when the computer vision task is executed based on the processed feature map.

Drawings

FIG. 1 is a schematic view of a scene of a method for processing a feature map of an image according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for processing a feature map of an image according to an embodiment of the present invention;

FIG. 3 is a schematic illustration of an initial feature map in an embodiment of the invention;

FIG. 4 is a flowchart illustrating an embodiment of step S203 in FIG. 2;

FIG. 5 is a flow chart illustrating a method for processing a feature map of an image according to another embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a processing apparatus for processing a feature map of an image according to an embodiment of the present invention.

Detailed Description

As described in the background, there is a need for a method for processing a feature map of an image, which can enrich information in the feature map of the image, thereby improving accuracy of processing a computer vision task.

The inventor of the present invention has found through research that existing computer vision processing tasks mainly include target detection, image recognition, etc., and determining the position of a target in an image is a part of vital importance for executing these computer vision processing tasks, for example, when executing tasks such as facial expression analysis or facial pose estimation, face key point detection is one of important steps. In the prior art, position information in an image is mainly determined according to pixel values of pixel points in a feature map, for example, edge positions and the like in the image can be determined according to the pixel values of the pixel points, but the position information obtained in this way is very limited and has low accuracy.

In order to solve the above technical problem, an embodiment of the present invention provides a method for processing a feature map of an image, in a scheme of the embodiment of the present invention, after an initial feature map of the image is obtained, coordinate coding is performed on each pixel point in the initial feature map, so as to obtain a coordinate value of each pixel point in the initial feature map, and then the initial feature map is updated according to the coordinate value of each pixel point. The coordinate values of the pixels are obtained by carrying out coordinate coding on the pixels of the initial characteristic diagram, so that the coordinate values of the pixels contain coordinate information of all the pixels in the initial characteristic diagram. Because the updated feature map is obtained by updating the initial feature map according to the coordinate values of the pixel points in the initial feature map, compared with the initial feature map, the updated feature map contains the coordinate information of the pixel points, and the accuracy can be effectively improved when the neural network executes a computer vision task based on the updated feature map.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

Referring to fig. 1, fig. 1 is a scene schematic diagram of a method for processing a feature map of an image according to an embodiment of the present invention. The method may be performed by a terminal, which may be any appropriate terminal, such as, but not limited to, a mobile phone, an internet of things device, a computer, and the like. The method may be applied to a training phase of the neural network, and may also be applied to a use phase of the trained neural network, but is not limited thereto. In other words, the embodiment of the present invention may be used to process the feature map of the training image for training the neural network, and may also be used to process the feature map of the image to be detected, where the image to be detected is the image input into the trained neural network.

A non-limiting description will be given below with reference to fig. 1 to a scenario of a method for processing a feature map of an image according to an embodiment of the present invention.

As shown in fig. 1, the neural network 20 may be used to perform computer vision tasks, which may be object detection, image classification, etc., and the specific type of computer vision task performed by the neural network 20 is not limited in any way by the embodiments of the present invention.

The neural network 20 may include a first feature extraction module 21, a first feature map processing module 22, a second feature extraction module 23, and a classifier 24. The first feature extraction module 21 is a neural network for extracting a feature map of the input image 10, and the first feature extraction module 21 may include one or more intermediate layers, where the one or more intermediate layers may include a Convolutional layer (Convolutional layer), a Pooling layer (Pooling layer), and the like, but is not limited thereto. The first feature extraction module 21 may be various existing neural networks for extracting feature maps, such as, but not limited to, residual error networks (ResNets), Visual Geometry Group (VGG) networks, and the like, and the specific type and structure of the first feature extraction module 21 are not limited in any way by the embodiment of the present invention.

Further, the first feature map processing module 22 is connected to the first feature extraction module 21, and may transmit the feature map output by the first feature extraction module 21 to the first feature map processing module 22, where the first feature map processing module 22 may be configured to execute the method for processing the feature map of the image according to the embodiment of the present invention on the input feature map.

Specifically, the feature map input to the first feature map processing module 22 may be coordinate-encoded to obtain coordinate values of each pixel in the feature map, and then the feature map input to the first feature map processing module 22 is updated according to the coordinate values of each pixel to obtain an updated feature map, where the updated feature map may include the coordinate values of each pixel, and the output of the first feature map processing module 22 may be the updated feature map, but is not limited thereto. More specific contents of the processing method for the feature map of the image will be described in detail below.

Further, the first feature map processing module 22 may be connected to the second feature extraction module 23, and may transmit the updated feature map output by the first feature map processing module 22 to the second feature extraction module 23, and the second feature extraction module 23 may further extract the feature map of the input image 10 based on the updated feature map to obtain the feature map output by the second feature extraction module 23. For the detailed description of the second feature extraction module 23, reference may be made to the above description related to the first feature extraction module 21, and details are not repeated here.

Further, the second feature extraction module 23 may be connected to the classifier 24, and the classifier 24 may be configured to calculate a prediction result of the neural network 20 on the input image according to the feature map input to the classifier 24. The classifier 24 may include a fully connected layer, the classifier 24 may be any suitable classifier, and the type and structure of the classifier 24 are not limited in any way by the embodiment of the present invention.

It should be noted that the first feature map processing module 22 may also be directly connected to the classifier 24, that is, the prediction result of the input image 10 may be directly calculated according to the updated feature map output by the first feature map processing module 23.

It should be further noted that the neural network 20 may further include a second feature map processing module (not shown), an input of the second feature map processing module may be connected to an output of the second feature extraction module 23, and the feature map output by the second feature extraction module 23 may be transmitted to the second feature map processing module, so as to perform the processing method of the feature map of the image according to the embodiment of the present invention on the feature map output by the second feature extraction module 23. In other words, the application object of the processing method of the feature map of the image according to the embodiment of the present invention may be the feature map output by the first feature extraction module 21, or the feature map output by the second feature extraction module 22, but is not limited thereto. For more details of the second feature map processing module, reference may be made to the above description related to the first feature map processing module 22, and details are not repeated here.

Referring to fig. 2, fig. 2 is a flowchart illustrating a method for processing a feature map of an image according to an embodiment of the present invention. The processing method of the feature map of the image shown in fig. 2 may include the steps of:

step S201: acquiring an initial characteristic map of an image;

step S202: carrying out coordinate coding on each pixel point in the initial characteristic diagram to obtain a coordinate value of each pixel point in the initial characteristic diagram;

step S203: and updating the initial characteristic diagram according to the coordinate value of each pixel point in the initial characteristic diagram to obtain an updated characteristic diagram, wherein the updated characteristic diagram comprises the coordinate value of each pixel point.

In a specific implementation of step S201, the image may be acquired, and then feature extraction may be performed on the image to obtain an initial feature map of the image.

Specifically, the image may be a training image for training a neural network, or may be an image to be tested for performing computer vision task processing by using the trained neural network, which is not limited in this embodiment of the present invention.

Further, the image may be acquired in real time, may be obtained from the outside, or may be pre-stored in a local data set, but is not limited thereto. The image may be a face image or an image including other preset targets, and the embodiment of the present invention does not limit the type of the image.

Further, the image may also be preprocessed before the initial feature map of the image is acquired. The pre-processing may include: the image is subjected to image denoising processing and the like, but is not limited thereto.

Further, feature extraction may be performed on the image to obtain an initial feature map of the image. Specifically, the image may be input to a feature extraction module to obtain an initial feature map. The feature extraction module may be various existing neural networks for extracting feature maps of images, and specific contents of the feature extraction module may refer to specific descriptions about the first feature extraction module 21 and the second feature extraction module 23 in fig. 1, which are not described herein again.

Referring to fig. 3, fig. 3 is a schematic diagram of an initial characteristic diagram in an embodiment of the invention.

Specifically, the initial feature map 11 shown in fig. 3 may have a plurality of channels (channels) in the Channel direction (i.e., the z-axis direction shown in fig. 3), each Channel having a feature subgraph 110 corresponding to that Channel. In other words, the initial feature map 11 may include a plurality of feature sub-maps 110, the feature sub-maps 110 correspond to channels of the initial feature map 11 one by one, each feature sub-map 110 is used to describe features of an image on the channel corresponding to the feature sub-map 110, and the initial feature map 11 may be obtained by superimposing, in a channel direction, the plurality of feature sub-maps.

Further, the plurality of feature sub-patterns 110 have the same width W and height H. The width may be the number of pixels in the row direction (i.e., the x-axis direction shown in fig. 3), and the height may be the number of pixels in the column direction (i.e., the y-axis direction shown in fig. 3). That is, the plurality of feature sub-images 110 include the same number of pixels, and the number of pixels in the row direction of each feature sub-image 110 is the same, and the number of pixels in the column direction of each feature sub-image 110 is the same, so there is a corresponding relationship between the pixels in the plurality of feature sub-images, specifically, there is a corresponding relationship between the pixels in the ith row and the jth column of each feature sub-image, where i and j are positive integers, i is greater than or equal to 1 and less than or equal to W, j is greater than or equal to 1 and less than or equal to H, W is the number of pixels in the row direction of the feature sub-image 110, and H is the number of pixels in the column direction of the feature sub-image 110. It should be noted that the x-axis direction, the y-axis direction, and the z-axis direction in fig. 3 are perpendicular to each other.

Further, the initial feature map 11 is obtained by superimposing a plurality of feature sub-maps in the channel direction, the width of the initial feature map is the width W of the feature sub-map, the height of the initial feature map is the height of the feature sub-map, and the depth of the initial feature map 11 is the channel number C of the initial feature map.

With reference to fig. 2, in the specific implementation of step S202, coordinate encoding may be performed on each pixel point in the initial feature map to obtain a coordinate value of each pixel point in the initial feature map.

Specifically, the coordinate values may include a row coordinate value and a column coordinate value, where the row coordinate value is a coordinate value of the pixel in a row direction, and the column coordinate value is a coordinate value of the pixel in a column direction. For example, for each pixel, the coordinate value of the pixel can be represented as (x, y), where x is the row coordinate value of the pixel and y is the column coordinate value of the pixel.

It should be noted that, since the width and height of each feature sub-graph are the same as those of the initial feature graph, for each pixel point in the initial feature graph, the row coordinate value of the pixel point is the row coordinate value of the pixel point in the feature sub-graph, and the column coordinate value of the pixel point is the column coordinate value of the pixel point in the feature sub-graph.

In a specific embodiment, for a pixel in the ith row and the jth column, the row coordinate value of the pixel may be i, and the column coordinate value may be j; wherein i and j are positive integers, i is greater than or equal to 1 and less than or equal to W, j is greater than or equal to 1 and less than or equal to H, W, H is a positive integer, W is the number of pixel points in the row direction in the initial feature map, H is the number of pixel points in the column direction in the initial feature map, in other words, W is the width of the initial feature map, and H is the height of the initial feature map.

In a non-limiting example, for a pixel in row i and column j, the row coordinate value of the pixel can be (i-1)/(W-1), and the column coordinate value can be (j-1)/(H-1); wherein i and j are natural numbers, i is greater than or equal to 0 and less than or equal to W-1, j is greater than or equal to 0 and less than or equal to H-1, W, H is a positive integer greater than 1, W is the number of pixels in the row direction in the initial feature map, and H is the number of pixels in the column direction in the initial feature map. Therefore, the value range of the coordinate value of each pixel point is 0-1, namely, the coordinate value of each pixel point is subjected to normalization processing, the initial characteristic diagram is updated by the coordinate value subjected to normalization processing, and the coordinate information in the updated characteristic diagram can be optimized.

In another embodiment, the coordinate values may further include coordinate values of the pixel points in the channel direction, that is, for each pixel point, the coordinate value of the pixel point may be represented as (x, y, z), where x is a row coordinate value of the pixel point, y is a column coordinate value of the pixel point, and z is a coordinate value of the pixel point in the channel direction. Wherein z is a positive integer, and z is less than or equal to C, and C is the number of channels of the initial characteristic diagram.

In the specific implementation of step S203, the initial feature map may be updated according to the coordinate value of each pixel point in the initial feature map, so as to obtain an updated feature map, where the updated feature map includes the coordinate value of each pixel point.

Specifically, the coordinate value of each pixel point in the initial feature map may be added to the value of each pixel point to obtain the updated feature map. For example, for a certain pixel, the value of the pixel in the initial feature map is pixel value 80, and the value of the pixel in the updated feature map may be (80, x, y) or (80, x, y, z), but is not limited thereto. That is, the value of the pixel point in the initial feature map can be supplemented according to the coordinate value of the pixel point, so as to obtain the updated feature map.

Referring to fig. 4, fig. 4 is a flowchart illustrating an embodiment of step S203. Step S203 shown in fig. 4 may include the steps of:

step S2031: generating a coordinate feature map according to the coordinate values of all the pixel points, wherein the value of each pixel point in the coordinate feature map is determined according to the coordinate value of the pixel point;

step S2032: and carrying out feature fusion processing on the coordinate feature map and the initial feature map to obtain the updated feature map.

In the specific implementation of step S3031, for each pixel point, the value of the pixel point in the coordinate feature map may be determined according to the row coordinate value and the column coordinate value of the pixel point, so as to obtain the coordinate feature map. The number of the pixel points of the coordinate feature map is the same as that of the pixel points in the initial feature map, the width of the coordinate feature map is the same as that of the initial feature map, and the height of the coordinate feature map is the same as that of the initial feature map.

In a specific embodiment, the coordinate feature map may include a first coordinate feature sub-map and a second coordinate feature sub-map, that is, the number of channels of the coordinate feature map is 2, the value of each pixel point in the first coordinate feature sub-map may be a row coordinate value of the pixel point, and the value of each pixel point in the second coordinate feature sub-map may be a column coordinate value of the pixel point, so that the coordinate feature map may be obtained.

In another embodiment, the number of channels of the coordinate feature map may be 1, that is, the coordinate feature map only includes a single coordinate feature sub-map, and the value of a pixel point in the coordinate feature sub-map may be calculated according to the row coordinate value and the ordinate value of the pixel point, for example, if the row coordinate value of a certain pixel point is 1 and the column coordinate value is 0.5, then the row coordinate value of the certain pixel point is 1 and the column coordinate value is 0.5The value of the pixel point in the coordinate characteristic diagram is

But is not limited thereto.

In a specific implementation of step S3032, a feature fusion process may be performed on the coordinate feature map and the initial feature map to obtain an updated feature map.

Specifically, the coordinate feature map and the initial feature map may be subjected to stitching (concat) in a channel direction to obtain the updated feature map, where the number of channels of the updated feature map is greater than the number of channels of the initial feature map.

More specifically, the coordinate feature map and the initial feature map may be superimposed in the channel direction to obtain an updated feature map, where the number of channels of the updated feature map is C + N, where C is the number of channels of the initial feature map, and N is the number of channels of the coordinate feature map. For example, if the number of channels of the coordinate feature map is 2, that is, the coordinate feature map includes the first coordinate feature sub-map and the second coordinate feature sub-map, the number of channels of the updated feature map is C + 2; if the number of channels of the coordinate feature map is 1, that is, the coordinate feature map only includes a single coordinate feature sub-map, the number of channels of the updated feature map is C + 1.

In the scheme of the embodiment of the invention, after the initial characteristic map of the image is obtained, the coordinate coding is performed on each pixel point in the initial characteristic map, so as to obtain the coordinate value of each pixel point in the initial characteristic map, and then the initial characteristic map is updated according to the coordinate value of each pixel point. The coordinate values of the pixels are obtained by carrying out coordinate coding on the pixels of the initial characteristic diagram, so that the coordinate values of the pixels contain coordinate information of all the pixels in the initial characteristic diagram. Because the updated feature map is obtained by updating the initial feature map according to the coordinate values of the pixel points in the initial feature map, compared with the initial feature map, the updated feature map contains the coordinate information of the pixel points, and the accuracy can be effectively improved when the computer vision task is executed based on the updated feature map.

Referring to fig. 5, fig. 5 is a processing method of a feature map of another image according to an embodiment of the present invention, and the processing method of the feature map of the image shown in fig. 5 may further include the following steps:

step S501: acquiring an initial characteristic map of an image;

step S502: carrying out coordinate coding on each pixel point in the initial characteristic diagram to obtain a coordinate value of each pixel point in the initial characteristic diagram;

step S503: updating the initial characteristic diagram according to the coordinate value of each pixel point in the initial characteristic diagram to obtain an updated characteristic diagram, wherein the updated characteristic diagram comprises the coordinate value of each pixel point;

step S504: and processing the updated feature map based on an attention mechanism to obtain a processed feature map.

For specific contents of step S501 to step S503, reference may be made to the above description related to fig. 3 and fig. 4, which is not described herein again.

In a specific implementation of step S504, attention extraction may be performed on the updated feature map based on an attention mechanism to obtain an attention map. The method for extracting Attention based on the Attention mechanism may be any suitable method, for example, a Convolution Block Attention Module (CBAM) may be used to extract Attention of the updated feature map to obtain the Attention map, but the method is not limited thereto. Note that the number of channels in the attention map may be the same as or greater than the number of channels in the initial feature map.

In a specific embodiment, a plurality of convolution cores may be first adopted to perform convolution operation on the updated feature map, so that the number of channels of the updated feature map is the same as the number of channels of the initial feature map, and thus the number of channels of the attention map may be the same as the number of channels of the initial feature map, so as to perform feature fusion processing on the initial feature map and the attention map subsequently. It should be noted that the number of convolution kernels is the same as the number of channels of the initial feature map, and the sizes of the convolution kernels are also the same, and the specific values of the sizes of the convolution kernels are not limited in any way in the embodiment of the present invention.

Further, feature fusion processing may be performed on the initial feature map and the attention map to obtain a processed feature map.

Specifically, the pixel points in the initial feature map and the pixel points in the attention map are in one-to-one correspondence, and the feature fusion processing of the initial feature map and the attention map includes: and for each pixel point, calculating the sum of the value of the pixel point in the attention diagram and the value in the initial feature diagram, and taking the sum as the value of the pixel point in the processed feature diagram.

More specifically, the attention map may include a plurality of attention subgraphs, the attention subgraphs correspond to the feature subgraphs one by one, the processed feature graph includes a plurality of processed feature subgraphs, and for each pixel point in each feature subgraph, the sum of the value of the pixel point and the value of the corresponding pixel point in the corresponding attention subgraph may be calculated, and the sum is taken as the value of the pixel point in the processed feature subgraph.

In a non-limiting example, if the number of channels of the attention map is greater than that of the initial feature map, the attention map may be convolved using a plurality of convolution kernels to obtain a transformed attention map, where the number of the plurality of convolution kernels is the same as that of the initial feature map, so that the number of channels of the transformed attention map may be the same as that of the initial feature map.

Further, feature fusion processing may be performed on the initial feature map and the transformed attention map to obtain a processed feature map. For specific contents of the feature fusion processing performed on the initial feature map and the transformed attention map, reference may be made to the above description of the feature fusion processing performed on the initial feature map and the attention map, and details thereof are not repeated herein.

Compared with the above-mentioned scheme of performing convolution operation on the updated feature map, the scheme of performing convolution operation on the attention map to obtain the transformed attention map and then obtaining the processed feature map according to the transformed attention map and the initial feature map can highlight the coordinate information of the pixel point.

In view of the above, in the embodiment of the present invention, the updated feature map is processed based on the attention mechanism, so as to obtain a processed feature map. Because the updated feature map contains the coordinate values of the pixel points, processing the updated feature map by adopting the attention mechanism can enhance the coordinate information of the pixel points in the updated feature map, namely, compared with the updated feature map, the coordinate information in the processed feature map is more prominent, and the accuracy can be further improved when the computer vision task is executed based on the processed feature map.

Referring to fig. 6, fig. 6 is a device for processing a feature map of an image according to an embodiment of the present invention, where the device may include: an obtaining module 61, configured to obtain an initial feature map of an image; a coordinate coding module 62, configured to perform coordinate coding on each pixel point in the initial feature map to obtain a coordinate value of each pixel point in the initial feature map; and the feature updating module 63 is configured to update the initial feature map according to the coordinate value of each pixel point in the initial feature map to obtain an updated feature map, where the updated feature map includes the coordinate value of each pixel point.

Further, the apparatus may further include an attention processing module (not shown), and the attention processing module may be configured to process the updated feature map based on an attention mechanism to obtain a processed feature map.

In a specific implementation, the processing device of the feature map of the image may correspond to a chip having a data processing function in the terminal, for example, an image processing chip; or to a chip module having a data processing function within the terminal, or to the terminal.

It should be noted that the processing device of the feature map of the image may also correspond to a neural network, for example, the feature map processing module 22 in fig. 1, but is not limited thereto.

For more details of the operation principle, the operation mode, the beneficial effects, and the like of the processing apparatus related to the feature diagram of the image shown in fig. 6, reference may be made to the above description related to fig. 1 to fig. 5, and details are not repeated here.

Embodiments of the present invention further provide a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the method for processing the feature map of the image. The storage medium may include ROM, RAM, magnetic or optical disks, etc. The storage medium may further include a non-volatile memory (non-volatile) or a non-transitory memory (non-transient), and the like.

The embodiment of the present invention further provides a terminal, which includes a memory and a processor, where the memory stores a computer program that can be executed on the processor, and the processor executes the steps of the processing method for the feature map of the image when executing the computer program. The terminal includes, but is not limited to, a mobile phone, a computer, a tablet computer and other terminal devices.

It should be understood that, in the embodiment of the present application, the processor may be a Central Processing Unit (CPU), and the processor may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will also be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example and not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (enhanced SDRAM), SDRAM (SLDRAM), synchlink DRAM (SLDRAM), and direct bus RAM (DR RAM).

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer instructions or the computer program are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer program may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire or wirelessly.

In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus and system may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative; for example, the division of the unit is only a logic function division, and there may be another division manner in actual implementation; for example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit. For example, for each device or product applied to or integrated into a chip, each module/unit included in the device or product may be implemented by hardware such as a circuit, or at least a part of the module/unit may be implemented by a software program running on a processor integrated within the chip, and the rest (if any) part of the module/unit may be implemented by hardware such as a circuit; for each device or product applied to or integrated with the chip module, each module/unit included in the device or product may be implemented by using hardware such as a circuit, and different modules/units may be located in the same component (e.g., a chip, a circuit module, etc.) or different components of the chip module, or at least some of the modules/units may be implemented by using a software program running on a processor integrated within the chip module, and the rest (if any) of the modules/units may be implemented by using hardware such as a circuit; for each device and product applied to or integrated in the terminal, each module/unit included in the device and product may be implemented by using hardware such as a circuit, and different modules/units may be located in the same component (e.g., a chip, a circuit module, etc.) or different components in the terminal, or at least part of the modules/units may be implemented by using a software program running on a processor integrated in the terminal, and the rest (if any) part of the modules/units may be implemented by using hardware such as a circuit.

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this document indicates that the former and latter related objects are in an "or" relationship.

The "plurality" appearing in the embodiments of the present application means two or more.

The descriptions of the first, second, etc. appearing in the embodiments of the present application are only for illustrating and differentiating the objects, and do not represent the order or the particular limitation of the number of the devices in the embodiments of the present application, and do not constitute any limitation to the embodiments of the present application.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for processing a feature map of an image, the method comprising:

acquiring an initial characteristic map of an image;

carrying out coordinate coding on each pixel point in the initial characteristic diagram to obtain a coordinate value of each pixel point in the initial characteristic diagram;

and updating the initial characteristic diagram according to the coordinate value of each pixel point in the initial characteristic diagram to obtain an updated characteristic diagram, wherein the updated characteristic diagram comprises the coordinate value of each pixel point.

2. The method according to claim 1, wherein the coordinate values include a row coordinate value and a column coordinate value, the row coordinate value is a coordinate value of the pixel point in a row direction, and the column coordinate value is a coordinate value of the pixel point in a column direction,

for the pixel point of the ith row and the jth column, the row coordinate value of the pixel point is i, and the column coordinate value is j;

wherein i and j are positive integers, i is greater than or equal to 1 and is less than or equal to W, j is greater than or equal to 1 and is less than or equal to H, W, H is a positive integer, W is the number of pixel points in the row direction in the initial feature map, and H is the number of pixel points in the column direction in the initial feature map.

3. The method according to claim 1, wherein the coordinate values include a row coordinate value and a column coordinate value, the row coordinate value is a coordinate value of the pixel point in a row direction, and the column coordinate value is a coordinate value of the pixel point in a column direction,

for the pixel point of the ith row and the jth column, the row coordinate value of the pixel point is (i-1)/(W-1), and the column coordinate value is (j-1)/(H-1);

wherein i and j are natural numbers, i is greater than or equal to 0 and less than or equal to W-1, j is greater than or equal to 0 and less than or equal to H-1, W, H is a positive integer greater than 1, W is the number of pixels in the row direction in the initial feature map, and H is the number of pixels in the column direction in the initial feature map.

4. The method for processing the feature map of the image according to claim 1, wherein updating the initial feature map according to the coordinate values of the respective pixel points in the initial feature map comprises:

generating a coordinate feature map according to the coordinate values of all the pixel points, wherein the value of each pixel point in the coordinate feature map is determined according to the coordinate value of the pixel point;

and carrying out feature fusion processing on the coordinate feature map and the initial feature map to obtain the updated feature map.

5. The method according to claim 4, wherein the coordinate feature map comprises a first coordinate feature sub-map and a second coordinate feature sub-map,

and the value of each pixel point in the first coordinate characteristic subgraph is the row coordinate value of the pixel point, and the value of each pixel point in the second coordinate characteristic subgraph is the column coordinate value of the pixel point.

6. The method for processing the feature map of the image according to claim 4, wherein performing feature fusion processing on the coordinate feature map and the initial feature map comprises:

and splicing the coordinate feature map and the initial feature map in a channel direction to obtain the updated feature map, wherein the number of channels of the updated feature map is greater than that of the initial feature map.

7. The method for processing the feature map of the image according to claim 1, further comprising:

and processing the updated feature map based on an attention mechanism to obtain a processed feature map.

8. The method for processing the feature map of the image according to claim 7, wherein before processing the updated feature map based on an attention mechanism, the method further comprises:

and performing convolution operation on the updated feature map by adopting a plurality of convolution cores so as to enable the number of channels of the updated feature map to be the same as that of the initial feature map.

9. The method according to claim 7, wherein processing the updated feature map based on an attention mechanism comprises:

performing attention extraction on the updated feature map based on the attention mechanism to obtain an attention map;

and performing feature fusion processing on the initial feature map and the attention map to obtain the processed feature map.

10. The method for processing the feature map of the image according to claim 9, wherein pixel points in the initial feature map and pixel points in the attention map are in one-to-one correspondence, and performing feature fusion processing on the initial feature map and the attention map comprises:

and for each pixel point, calculating the sum of the value of the pixel point in the attention diagram and the value in the initial feature diagram, and taking the sum as the value of the pixel point in the processed feature diagram.

11. The method according to claim 7, wherein processing the updated feature map based on an attention mechanism comprises:

performing convolution operation on the attention diagram by adopting a plurality of convolution cores to obtain a transformed attention diagram;

performing feature fusion processing on the initial feature map and the transformed attention map to obtain the processed feature map;

and the number of the convolution kernels is the same as the number of channels of the initial feature map.

12. An apparatus for processing a feature map of an image, the apparatus comprising:

the acquisition module is used for acquiring an initial characteristic map of the image;

the coordinate coding module is used for carrying out coordinate coding on each pixel point in the initial characteristic diagram so as to obtain a coordinate value of each pixel point in the initial characteristic diagram;

and the characteristic updating module is used for updating the initial characteristic diagram according to the coordinate value of each pixel point in the initial characteristic diagram so as to obtain an updated characteristic diagram, and the updated characteristic diagram comprises the coordinate value of each pixel point.

13. A storage medium having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, performs the steps of the method of processing a feature map of an image according to any one of claims 1 to 11.

14. A terminal comprising a memory and a processor, said memory having stored thereon a computer program operable on said processor, characterized in that said processor, when executing said computer program, performs the steps of the method of processing a feature map of an image according to any one of claims 1 to 11.