WO2022257433A1 - Procédé et appareil de traitement de carte de caractéristiques d'image, support de stockage et terminal - Google Patents

Procédé et appareil de traitement de carte de caractéristiques d'image, support de stockage et terminal Download PDF

Info

Publication number
WO2022257433A1
WO2022257433A1 PCT/CN2021/141468 CN2021141468W WO2022257433A1 WO 2022257433 A1 WO2022257433 A1 WO 2022257433A1 CN 2021141468 W CN2021141468 W CN 2021141468W WO 2022257433 A1 WO2022257433 A1 WO 2022257433A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
coordinate
map
pixel point
initial
Prior art date
Application number
PCT/CN2021/141468
Other languages
English (en)
Chinese (zh)
Inventor
李明蹊
Original Assignee
展讯通信(上海)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 展讯通信(上海)有限公司 filed Critical 展讯通信(上海)有限公司
Publication of WO2022257433A1 publication Critical patent/WO2022257433A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present invention relates to the technical field of image processing, and in particular to a method and device, a storage medium, and a terminal for processing a feature map of an image.
  • neural network Neurons
  • Computer Vision Computer Vision
  • Feature Map feature map
  • the information in the feature map extracted in the prior art is relatively single, usually only contains the pixel values of the pixels in the image, the information in the feature map is very limited, and the accuracy of the neural network in processing computer vision tasks still needs to be improved .
  • the technical problem solved by the present invention is to provide a method for processing feature maps of images, which can enrich the information in the feature maps, thereby improving the accuracy of processing computer vision tasks.
  • an embodiment of the present invention provides a method for processing a feature map of an image.
  • the method includes: obtaining an initial feature map of an image; performing coordinate encoding on each pixel in the initial feature map to obtain each Coordinate values of pixels in the initial feature map; updating the initial feature map according to the coordinate values of each pixel point in the initial feature map to obtain an updated feature map, the updated feature map includes The coordinate value of each pixel point.
  • the coordinate value includes a row coordinate value and a column coordinate value
  • the row coordinate value is the coordinate value of the pixel point in the row direction
  • the column coordinate value is the coordinate value of the pixel point in the column direction value
  • i and j are positive integers
  • 1 ⁇ i ⁇ W, 1 ⁇ j ⁇ H, W and H are positive integers
  • W is the number of pixels in the row direction in the initial feature map
  • H is the number of pixels in the column direction in the initial feature map.
  • the coordinate value includes a row coordinate value and a column coordinate value
  • the row coordinate value is the coordinate value of the pixel point in the row direction
  • the column coordinate value is the coordinate value of the pixel point in the column direction value
  • the row coordinate value of the pixel point is (i-1)/(W-1)
  • the column coordinate value is (j-1)/(H-1);
  • i and j are natural numbers, 0 ⁇ i ⁇ W-1, 0 ⁇ j ⁇ H-1
  • W and H are positive integers greater than 1
  • W is the number of pixels in the row direction in the initial feature map
  • H is the number of pixels in the column direction in the initial feature map.
  • updating the initial feature map according to the coordinate values of each pixel in the initial feature map includes: generating a coordinate feature map according to the coordinate values of each pixel, wherein each pixel in the coordinate feature map The value of is determined according to the coordinate value of the pixel point; performing feature fusion processing on the coordinate feature map and the initial feature map to obtain the updated feature map.
  • the coordinate feature map includes a first coordinate feature submap and a second coordinate feature submap
  • the value of each pixel in the first coordinate feature submap is the row coordinate value of the pixel point
  • the The value of each pixel point in the second coordinate feature submap is the column coordinate value of the pixel point.
  • performing feature fusion processing on the coordinate feature map and the initial feature map includes: performing splicing processing on the coordinate feature map and the initial feature map in the channel direction to obtain the updated feature
  • the number of channels of the updated feature map is greater than the number of channels of the initial feature map.
  • the method further includes: processing the updated feature map based on an attention mechanism to obtain a processed feature map.
  • the method before processing the updated feature map based on the attention mechanism, the method further includes: performing a convolution operation on the updated feature map by using a plurality of convolution kernels, so that the updated The number of channels of the feature map of is the same as that of the initial feature map.
  • processing the updated feature map based on the attention mechanism includes: performing attention extraction on the updated feature map based on the attention mechanism to obtain an attention map; Perform feature fusion processing with the attention map to obtain the processed feature map.
  • the pixels in the initial feature map correspond to the pixels in the attention map one by one
  • performing feature fusion processing on the initial feature map and the attention map includes: for each pixel, calculating The sum of the value of the pixel in the attention map and the value in the initial feature map, and using the sum as the value of the pixel in the processed feature map.
  • processing the updated feature map based on the attention mechanism includes: performing attention extraction on the updated feature map based on the attention mechanism to obtain an attention map; Convolving the attention map to obtain a transformed attention map; performing feature fusion processing on the initial feature map and the transformed attention map to obtain the processed feature map; wherein, the The number of convolution kernels is the same as the number of channels of the initial feature map.
  • an embodiment of the present invention also provides a device for processing a feature map of an image.
  • the device includes: an acquisition module for acquiring an initial feature map of an image; a coordinate encoding module for processing the initial feature map Coordinate encoding of each pixel point in the figure to obtain the coordinate value of each pixel point in the initial feature map; a feature update module for updating the initial feature according to the coordinate value of each pixel point in the initial feature map map to obtain an updated feature map, and the updated feature map includes the coordinate values of each pixel.
  • An embodiment of the present invention also provides a storage medium on which a computer program is stored, and when the computer program is run by a processor, the steps of the above-mentioned method for processing a feature map of an image are executed.
  • An embodiment of the present invention also provides a terminal, including a memory and a processor, the memory stores a computer program that can run on the processor, and when the processor runs the computer program, the feature of the above image is executed The steps of the processing method for the graph.
  • the coordinates of each pixel in the initial feature map are encoded, so as to obtain the coordinate values of each pixel in the initial feature map, and then according to the coordinates of each pixel
  • the value updates the initial feature map. Since the coordinate value of the pixel point is obtained by encoding the coordinates of the pixel point in the initial feature map, the coordinate value of the pixel point contains the coordinate information of each pixel point in the initial feature map.
  • the updated feature map is obtained by updating the initial feature map according to the coordinate values of each pixel in the initial feature map, compared with the initial feature map, the updated feature map contains the coordinate information of each pixel point , the accuracy can be effectively improved when performing computer vision tasks based on the updated feature maps.
  • the embodiment of the present invention performs normalization processing on the coordinate values of each pixel point, and uses the normalized coordinate values to update the initial feature map, so that the updated feature map
  • the coordinate information in is more optimized, thereby further improving the accuracy of performing computer vision tasks.
  • the updated feature map is processed based on the attention mechanism to obtain the processed feature map. Since the updated feature map contains the coordinate values of each pixel point, processing the updated feature map using the attention mechanism can enhance the coordinate information of the pixel point in the updated feature map, that is, the same as the updated feature map The coordinate information in the processed feature map is more prominent than that in the image, and the accuracy can be further improved when performing computer vision tasks based on the processed feature map.
  • FIG. 1 is a schematic diagram of a scene of a method for processing a feature map of an image in an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a method for processing a feature map of an image in an embodiment of the present invention
  • Fig. 3 is a schematic diagram of an initial feature map in an embodiment of the present invention.
  • FIG. 4 is a schematic flow chart of a specific implementation of step S203 in FIG. 2;
  • FIG. 5 is a schematic flowchart of another method for processing a feature map of an image in an embodiment of the present invention.
  • Fig. 6 is a schematic structural diagram of an apparatus for processing a feature map of an image in an embodiment of the present invention.
  • the inventors of the present invention have found through research that the existing computer vision processing tasks mainly include target detection, image recognition, etc., and determining the position of the target in the image is a crucial part of performing these computer vision processing tasks.
  • Face key point detection is one of the most important steps in tasks such as facial expression analysis or face pose estimation.
  • the position information in the image is mainly determined according to the pixel value of each pixel point in the feature map, for example, the edge position in the image can be determined according to the pixel value of the pixel point, but the position obtained in this way Information is very limited and less accurate.
  • an embodiment of the present invention provides a method for processing a feature map of an image.
  • the coordinates of each pixel in the initial feature map are encoded, In this way, the coordinate value of each pixel point in the initial feature map is obtained, and then the initial feature map is updated according to the coordinate value of each pixel point. Since the coordinate value of the pixel point is obtained by encoding the coordinates of the pixel point in the initial feature map, the coordinate value of the pixel point contains the coordinate information of each pixel point in the initial feature map.
  • the updated feature map is obtained by updating the initial feature map according to the coordinate values of each pixel in the initial feature map, compared with the initial feature map, the updated feature map contains the coordinate information of each pixel point. Accuracy can be effectively improved when the network performs computer vision tasks based on the updated feature maps.
  • FIG. 1 is a schematic diagram of a scene of a method for processing a feature map of an image in an embodiment of the present invention.
  • the method can be executed by a terminal, and the terminal can be various appropriate terminals, for example, a mobile phone, an Internet of Things device, a computer, etc., but is not limited thereto.
  • the method can be applied to the training phase of the neural network, and can also be applied to the use phase of the trained neural network, but is not limited thereto.
  • the embodiment of the present invention can be used to process the feature map of the training image for training the neural network, and can also be used to process the feature map of the image to be tested, and the image to be tested refers to the image input to the trained neural network.
  • the neural network 20 can be used to perform computer vision tasks, and the computer vision tasks can be object detection, image classification, etc., and the embodiment of the present invention does not specify the specific type of computer vision tasks performed by the neural network 20. Make any restrictions.
  • the neural network 20 may include a first feature extraction module 21 , a first feature map processing module 22 , a second feature extraction module 23 and a classifier 24 .
  • the first feature extraction module 21 is a neural network for extracting the feature map of the input image 10
  • the first feature extraction module 21 may include one or more intermediate layers, and the one or more intermediate layers may include a convolutional layer (Convolutional layer), can also include pooling layer (Pooling layer), etc., but not limited to this.
  • the first feature extraction module 21 can be various existing appropriate neural networks for extracting feature maps, for example, residual network (ResNets), visual geometry group (Visual Geometry Group, VGG) network, etc., but not limited to Therefore, the embodiment of the present invention does not impose any limitation on the specific type and structure of the first feature extraction module 21 .
  • ResNets residual network
  • VGG Visual Geometry Group
  • first feature map processing module 22 is connected with the first feature extraction module 21, and the feature map output by the first feature extraction module 21 can be transmitted to the first feature map processing module 22, and the first feature map processing module 22 It can be used to execute the image feature map processing method described in the embodiment of the present invention on the input feature map.
  • coordinate encoding can be performed on the feature map input to the first feature map processing module 22 to obtain the coordinate values of each pixel in the feature map, and then update the input to the first feature map according to the coordinate values of each pixel point.
  • the feature map of the processing module 22, to obtain the updated feature map can include the coordinate values of each pixel point
  • the output of the first feature map processing module 22 can be the updated feature map, but not Not limited to this. More specific content of the method for processing the feature map of the image will be described in detail below.
  • first feature map processing module 22 can be connected with the second feature extraction module 23, and the updated feature map output by the first feature map processing module 22 can be transmitted to the second feature extraction module 23, and the second feature extraction module 23 may further extract the feature map of the input image 10 based on the updated feature map to obtain the feature map output by the second feature extraction module 23 .
  • the second feature extraction module 23 may further extract the feature map of the input image 10 based on the updated feature map to obtain the feature map output by the second feature extraction module 23 .
  • the second feature extraction module 23 can be connected with the classifier 24, and the classifier 24 can be used to calculate the prediction result of the neural network 20 for the input image according to the feature map input to the classifier 24.
  • the classifier 24 may include a fully connected layer, and the classifier 24 may be various appropriate existing classifiers.
  • the embodiment of the present invention does not impose any limitation on the type and structure of the classifier 24 .
  • the first feature map processing module 22 can also be directly connected to the classifier 24 , that is, the prediction result of the input image 10 can be directly calculated according to the updated feature map output by the first feature map processing module 23 .
  • the neural network 20 can also include a second feature map processing module (not shown), the input of the second feature map processing module can be connected with the output of the second feature extraction module 23, and the second feature extraction module 23 can be extracted
  • the feature map output by the module 23 is transmitted to the second feature map processing module, so as to execute the image feature map processing method described in the embodiment of the present invention on the feature map output by the second feature extraction module 23 .
  • the application object of the image feature map processing method described in the embodiment of the present invention may be the feature map output by the first feature extraction module 21, or the feature map output by the second feature extraction module 22, but is not limited to this.
  • the second feature map processing module please refer to the related description about the first feature map processing module 22 above, which will not be repeated here.
  • FIG. 2 is a schematic flowchart of a method for processing a feature map of an image in an embodiment of the present invention.
  • the processing method of the feature map of the image shown in Fig. 2 may comprise the following steps:
  • Step S201 Obtain the initial feature map of the image
  • Step S202 Coordinate encoding of each pixel in the initial feature map to obtain the coordinate value of each pixel in the initial feature map
  • Step S203 updating the initial feature map according to the coordinate values of each pixel point in the initial feature map to obtain an updated feature map, the updated feature map including the coordinate values of each pixel point.
  • the image may be acquired, and then feature extraction is performed on the image to obtain an initial feature map of the image.
  • the image may be a training image for training a neural network, or an image to be tested using a trained neural network for computer vision task processing, which is not limited in this embodiment of the present invention.
  • the image may be collected in real time, may also be obtained from outside, and may also be pre-stored in a local data set, but it is not limited thereto.
  • the image may be a face image, or an image containing other preset objects, and the embodiment of the present invention does not impose any limitation on the type of the image.
  • the image may also be preprocessed.
  • the preprocessing may include: performing image denoising processing on the image, etc., but is not limited thereto.
  • feature extraction may be performed on the image to obtain an initial feature map of the image.
  • the image can be input to a feature extraction module to obtain an initial feature map.
  • the feature extraction module can be various existing neural networks for extracting image feature maps, and the specific content of the feature extraction module can refer to the specific description about the first feature extraction module 21 and the second feature extraction module 23 in Fig. 1 , which will not be repeated here.
  • FIG. 3 is a schematic diagram of an initial feature map in an embodiment of the present invention.
  • the initial feature map 11 shown in FIG. 3 can have multiple channels (Channel) in the channel direction (that is, the z-axis direction shown in FIG. 3 ), and each channel has a feature submap corresponding to the channel 110.
  • the initial feature map 11 may include multiple feature submaps 110, the feature submaps 110 correspond to the channels of the initial feature map 11, and each feature submap 110 is used to describe the image on the channel corresponding to the feature submap 110
  • the initial feature map 11 can be obtained by superimposing multiple feature submaps in the channel direction.
  • the plurality of feature submaps 110 have the same width W and height H.
  • the width may be the number of pixels in the row direction (that is, the x-axis direction shown in FIG. 3 ), and the height may be the pixels in the column direction (that is, the y-axis direction shown in FIG. 3 ). count.
  • the multiple feature submaps 110 contain the same number of pixels, and the number of pixels in the row direction in each feature submap 110 is the same and the number of pixels in the column direction is the same, thus, the multiple feature submaps
  • the pixels in the i-th row and the j-th column in each feature sub-map where i and j are positive integers, 1 ⁇ i ⁇ W, 1 ⁇ j ⁇ H, W is the number of pixels in the row direction of the feature sub-image 110, and H is the number of pixels in the column direction of the feature sub-image 110.
  • i and j are positive integers, 1 ⁇ i ⁇ W, 1 ⁇ j ⁇ H
  • W is the number of pixels in the row direction of the feature sub-image 110
  • H is the number of pixels in the column direction of the feature sub-image 110.
  • the x-axis direction, the y-axis direction and the z-axis direction in FIG. 3 are perpendicular to each other.
  • the width of the initial feature map is the width W of the feature submap
  • the height of the initial feature map is the height of the feature submap
  • the initial The depth of the feature map 11 is the channel number C of the initial feature map.
  • step S202 coordinate encoding may be performed on each pixel in the initial feature map to obtain the coordinate value of each pixel in the initial feature map.
  • the coordinate value may include a row coordinate value and a column coordinate value
  • the row coordinate value is the coordinate value of the pixel point in the row direction
  • the column coordinate value is the coordinate value of the pixel point in the column direction.
  • the coordinate value of the pixel point can be expressed as (x, y), where x is the row coordinate value of the pixel point, and y is the column coordinate value of the pixel point.
  • each feature submap is the same as those of the initial feature map, for each pixel in the initial feature map, the row coordinate value of the pixel is the pixel The row coordinate value of the point in the feature submap, and the column coordinate value of the pixel point is the column coordinate value of the pixel point in the feature submap.
  • the row coordinate value of the pixel point can be i
  • the column coordinate value can be j; wherein, i and j are positive integers, 1 ⁇ i ⁇ W , 1 ⁇ j ⁇ H, W and H are positive integers, W is the number of pixels in the row direction in the initial feature map, H is the number of pixels in the column direction in the initial feature map, in other words , W is the width of the initial feature map, and H is the height of the initial feature map.
  • the row coordinate value of the pixel point can be (i-1)/(W-1), and the column coordinate value can be (j-1 )/(H-1); wherein, i and j are natural numbers, 0 ⁇ i ⁇ W-1, 0 ⁇ j ⁇ H-1, W and H are positive integers greater than 1, and W is the initial feature map
  • i and j are natural numbers, 0 ⁇ i ⁇ W-1, 0 ⁇ j ⁇ H-1, W and H are positive integers greater than 1, and W is the initial feature map
  • the number of pixels in the row direction, H is the number of pixels in the column direction in the initial feature map.
  • the value range of the coordinate value of each pixel point is 0 to 1, that is, the embodiment of the present invention performs normalization processing on the coordinate value of each pixel point, and uses the normalized coordinate value to update the initial
  • the feature map can optimize the coordinate information in the updated feature map.
  • the coordinate value may also include the coordinate value of the pixel point in the channel direction, that is, for each pixel point, the coordinate value of the pixel point may be expressed as (x, y, z), where , where x is the row coordinate value of the pixel point, y is the column coordinate value of the pixel point, and z is the coordinate value of the pixel point in the channel direction.
  • z is a positive integer
  • z ⁇ C C is the number of channels of the initial feature map.
  • the initial feature map can be updated according to the coordinate values of each pixel point in the initial feature map to obtain an updated feature map, and the updated feature map includes the coordinates of each pixel point coordinate value.
  • the coordinate values of each pixel in the initial feature map can be added to the value of each pixel to obtain an updated feature map.
  • the value of the pixel in the initial feature map is 80
  • the value of the pixel in the updated feature map can be (80, x, y) or (80, x , y, z) and so on, but not limited thereto. That is, the value of the pixel point in the initial feature map can be supplemented according to the coordinate value of the pixel point to obtain an updated feature map.
  • FIG. 4 is a schematic flowchart of a specific implementation manner of step S203.
  • Step S203 shown in FIG. 4 may include the following steps:
  • Step S2031 Generate a coordinate feature map according to the coordinate value of each pixel point, wherein the value of each pixel point in the coordinate feature map is determined according to the coordinate value of the pixel point;
  • Step S2032 Perform feature fusion processing on the coordinate feature map and the initial feature map to obtain the updated feature map.
  • the value of the pixel point in the coordinate feature map can be determined according to the row coordinate value and column coordinate value of the pixel point, so as to obtain the coordinate feature map.
  • the number of pixels in the coordinate feature map is the same as the number of pixels in the initial feature map, and the width of the coordinate feature map is the same as that of the initial feature map, and the height of the coordinate feature map is the same as that of the initial feature map.
  • the coordinate feature map may include the first coordinate feature submap and the second coordinate feature submap, that is, the number of channels in the coordinate feature map is 2, and each pixel in the first coordinate feature submap
  • the value of can be the row coordinate value of the pixel point
  • the value of each pixel point in the second coordinate feature submap can be the column coordinate value of the pixel point, thus the coordinate feature map can be obtained.
  • the number of channels of the coordinate feature map can be 1, that is, the coordinate feature map only includes a single coordinate feature submap, and the value of the pixel in the coordinate feature submap can be based on the row of the pixel point
  • the coordinate value and the vertical coordinate value are calculated. For example, if the row coordinate value of a pixel is 1 and the column coordinate value is 0.5, then the value of the pixel point in the coordinate feature map is But it is not limited to this.
  • step S2032 feature fusion processing may be performed on the coordinate feature map and the initial feature map to obtain an updated feature map.
  • the coordinate feature map and the initial feature map can be spliced (concat) in the channel direction to obtain the updated feature map, and the number of channels of the updated feature map is greater than that of the initial feature map.
  • the number of channels of the feature map is greater than that of the initial feature map.
  • the coordinate feature map and the initial feature map can be superimposed in the channel direction to obtain an updated feature map.
  • the number of channels of the updated feature map is C+N, where C is the channel of the initial feature map N is the number of channels of the coordinate feature map.
  • C is the channel of the initial feature map
  • N is the number of channels of the coordinate feature map.
  • the channel number of the updated feature map is C+2; if the coordinate feature The channel number of the map is 1, that is, the coordinate feature map only includes a single coordinate feature submap, and the channel number of the updated feature map is C+1.
  • the coordinates of each pixel in the initial feature map are encoded, so as to obtain the coordinate values of each pixel in the initial feature map, and then according to each pixel
  • the coordinate values of the points update the initial feature map. Since the coordinate value of the pixel point is obtained by encoding the coordinates of the pixel point in the initial feature map, the coordinate value of the pixel point contains the coordinate information of each pixel point in the initial feature map.
  • the updated feature map is obtained by updating the initial feature map according to the coordinate values of each pixel in the initial feature map, compared with the initial feature map, the updated feature map contains the coordinate information of each pixel point, based on The updated feature maps can effectively improve the accuracy when performing computer vision tasks.
  • Fig. 5 is another method for processing a feature map of an image in an embodiment of the present invention, and the method for processing a feature map of an image shown in Fig. 5 may further include the following steps:
  • Step S501 Obtain the initial feature map of the image
  • Step S502 Coordinate encoding of each pixel in the initial feature map to obtain the coordinate value of each pixel in the initial feature map;
  • Step S503 updating the initial feature map according to the coordinate values of each pixel point in the initial feature map to obtain an updated feature map, the updated feature map includes the coordinate values of each pixel point;
  • Step S504 Process the updated feature map based on the attention mechanism to obtain a processed feature map.
  • step S501 to step S503 For the specific content of step S501 to step S503, reference may be made to the relevant descriptions in FIG. 3 and FIG. 4 above, and details will not be repeated here.
  • step S504 attention extraction may be performed on the updated feature map based on the attention mechanism to obtain the attention map.
  • the attention extraction method based on the attention mechanism can be various existing appropriate methods, for example, the Convolutional Block Attention Module (CBAM) can be used to pay attention to the updated feature map Extraction to get the attention map, but not limited to it.
  • CBAM Convolutional Block Attention Module
  • the number of channels of the attention map can be the same as the number of channels of the initial feature map, or it can be larger than the number of channels of the initial feature map.
  • multiple convolution kernels may be used to perform convolution operations on the updated feature map, so that the number of channels of the updated feature map is the same as the number of channels of the initial feature map, Therefore, the number of channels of the attention map can be made the same as that of the initial feature map, so that the subsequent feature fusion processing can be performed on the initial feature map and the attention map.
  • the number of convolution kernels is the same as the number of channels of the initial feature map, and the sizes of multiple convolution kernels are also the same. The embodiment of the present invention does not impose any restrictions on the specific values of the sizes of the convolution kernels.
  • feature fusion processing can be performed on the initial feature map and the attention map to obtain the processed feature map.
  • the pixels in the initial feature map correspond to the pixels in the attention map one by one
  • performing feature fusion processing on the initial feature map and the attention map includes: for each pixel, calculating The sum of the value of the pixel in the attention map and the value in the initial feature map, and using the sum as the value of the pixel in the processed feature map.
  • the attention map can include multiple attention submaps, and the attention submap corresponds to the feature submap one by one.
  • the processed feature map includes multiple processed feature submaps. For each feature submap For each pixel, the sum of the value of the pixel and the value of the corresponding pixel in the corresponding attention submap may be calculated, and the sum may be used as the value of the pixel in the processed feature submap.
  • the number of channels of the attention map is greater than the number of channels of the initial feature map, multiple convolution kernels can be used to perform convolution operations on the attention map to obtain a transformed attention map, the multiple The number of convolution kernels is the same as the number of channels of the original feature map, so that the number of channels of the transformed attention map can be the same as that of the original feature map.
  • feature fusion processing can be performed on the initial feature map and the transformed attention map to obtain the processed feature map.
  • feature fusion processing can be performed on the initial feature map and the transformed attention map to obtain the processed feature map.
  • the convolution operation is performed on the attention map to obtain the transformed attention map, and then according to the transformed attention map and
  • the scheme of obtaining the processed feature map from the initial feature map can highlight the coordinate information of the pixel points more.
  • the updated feature map is processed based on the attention mechanism to obtain the processed feature map. Since the updated feature map contains the coordinate values of each pixel point, processing the updated feature map using the attention mechanism can enhance the coordinate information of the pixel point in the updated feature map, that is, the same as the updated feature map The coordinate information in the processed feature map is more prominent than that in the image, and the accuracy can be further improved when performing computer vision tasks based on the processed feature map.
  • FIG. 6 is a processing device for a feature map of an image in an embodiment of the present invention, and the device may include: an acquisition module 61 for acquiring an initial feature map of an image; a coordinate encoding module 62 for initial Each pixel in the feature map performs coordinate encoding to obtain the coordinate value of each pixel in the initial feature map; the feature update module 63 is used to update the initial feature map according to the coordinate values of each pixel in the initial feature map to obtain An updated feature map, where the updated feature map includes the coordinate values of each pixel.
  • the device may also include an attention processing module (not shown in the figure), and the attention processing module may be used to process the updated feature map based on an attention mechanism to obtain a processed feature map .
  • the device for processing the feature map of the above image may correspond to a chip with data processing function in the terminal, for example, an image processing chip; or correspond to a chip module with data processing function in the terminal, or correspond to the terminal.
  • the above-mentioned device for processing the feature map of the image may also correspond to a neural network, for example, may correspond to the feature map processing module 22 in FIG. 1 , but is not limited thereto.
  • An embodiment of the present invention also provides a storage medium on which a computer program is stored, and when the computer program is run by a processor, the steps of the above-mentioned method for processing a feature map of an image are executed.
  • the storage medium may include ROM, RAM, magnetic or optical disks, and the like.
  • the storage medium may also include a non-volatile memory (non-volatile) or a non-transitory (non-transitory) memory, and the like.
  • An embodiment of the present invention also provides a terminal, including a memory and a processor, the memory stores a computer program that can run on the processor, and when the processor runs the computer program, the feature of the above image is executed The steps of the processing method for the graph.
  • the terminals include but are not limited to terminal devices such as mobile phones, computers, and tablet computers.
  • the processor may be a central processing unit (CPU for short), and the processor may also be other general-purpose processors, digital signal processors (digital signal processor, DSP for short) , application specific integrated circuit (ASIC for short), off-the-shelf programmable gate array (field programmable gate array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory in the embodiments of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories.
  • the non-volatile memory can be read-only memory (read-only memory, referred to as ROM), programmable read-only memory (programmable ROM, referred to as PROM), erasable programmable read-only memory (erasable PROM, referred to as EPROM) , Electrically Erasable Programmable Read-Only Memory (electrically EPROM, referred to as EEPROM) or flash memory.
  • the volatile memory can be random access memory (RAM), which acts as external cache memory.
  • RAM random access memory
  • static random access memory static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous Dynamic random access memory
  • SDRAM synchronous Dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM Synchronously connect dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations.
  • the above-described embodiments may be implemented in whole or in part in the form of computer program products.
  • the computer program product comprises one or more computer instructions or computer programs.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer program can be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program can be downloaded from a website, computer, server or data center Wired or wireless transmission to another website site, computer, server or data center.
  • the disclosed methods, devices and systems can be implemented in other ways.
  • the device embodiments described above are only illustrative; for example, the division of the units is only a logical function division, and there may be other division methods in actual implementation; for example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may be physically included separately, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • each module/unit contained therein may be realized by hardware such as a circuit, or at least some modules/units may be realized by a software program, and the software program Running on the integrated processor inside the chip, the remaining (if any) modules/units can be realized by means of hardware such as circuits; They are all realized by means of hardware such as circuits, and different modules/units can be located in the same component (such as chips, circuit modules, etc.) or different components of the chip module, or at least some modules/units can be realized by means of software programs,
  • the software program runs on the processor integrated in the chip module, and the remaining (if any) modules/units can be realized by hardware such as circuits; /Units can be realized by means of hardware such as circuits, and different modules/units can be located in the same component (such as chips, circuit modules, etc.) or different components in the terminal, or at least some modules/units can be implemented in the form of software programs Realization, the software program runs on
  • Multiple appearing in the embodiments of the present application means two or more.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

Procédé et appareil de traitement d'une carte de caractéristiques d'une image, support de stockage et terminal. Le procédé consiste : à obtenir une carte de caractéristiques initiale d'une image ; à effectuer un codage de coordonnées sur chaque point de pixel dans la carte de caractéristiques initiale afin d'obtenir les valeurs de coordonnées de chaque point de pixel dans la carte de caractéristiques initiale ; et à mettre à jour la carte de caractéristiques initiale en fonction des valeurs de coordonnées de chaque point de pixel dans la carte de caractéristiques initiale afin d'obtenir une carte de caractéristiques mise à jour, la carte de caractéristiques mise à jour comprenant les valeurs de coordonnées de chaque point de pixel. Grâce à la solution de la présente invention, les informations dans la carte de caractéristiques peuvent être enrichies.
PCT/CN2021/141468 2021-06-10 2021-12-27 Procédé et appareil de traitement de carte de caractéristiques d'image, support de stockage et terminal WO2022257433A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110645169.2A CN113255700B (zh) 2021-06-10 2021-06-10 图像的特征图的处理方法及装置、存储介质、终端
CN202110645169.2 2021-06-10

Publications (1)

Publication Number Publication Date
WO2022257433A1 true WO2022257433A1 (fr) 2022-12-15

Family

ID=77187306

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/141468 WO2022257433A1 (fr) 2021-06-10 2021-12-27 Procédé et appareil de traitement de carte de caractéristiques d'image, support de stockage et terminal

Country Status (2)

Country Link
CN (1) CN113255700B (fr)
WO (1) WO2022257433A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255700B (zh) * 2021-06-10 2021-11-02 展讯通信(上海)有限公司 图像的特征图的处理方法及装置、存储介质、终端

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558844A (zh) * 2018-11-30 2019-04-02 厦门商集网络科技有限责任公司 基于图像归一化提升自定义模板识别率的方法及设备
US20190130204A1 (en) * 2017-10-31 2019-05-02 The University Of Florida Research Foundation, Incorporated Apparatus and method for detecting scene text in an image
CN111680544A (zh) * 2020-04-24 2020-09-18 北京迈格威科技有限公司 人脸识别方法、装置、系统、设备及介质
CN112116074A (zh) * 2020-09-18 2020-12-22 西北工业大学 一种基于二维空间编码的图像描述方法
CN113255700A (zh) * 2021-06-10 2021-08-13 展讯通信(上海)有限公司 图像的特征图的处理方法及装置、存储介质、终端

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106934397B (zh) * 2017-03-13 2020-09-01 北京市商汤科技开发有限公司 图像处理方法、装置及电子设备
WO2020059446A1 (fr) * 2018-09-20 2020-03-26 富士フイルム株式会社 Dispositif et procédé d'apprentissage
CN109583584B (zh) * 2018-11-14 2020-07-10 中山大学 可使具有全连接层的cnn接受不定形状输入的方法及系统
CN111723829B (zh) * 2019-03-18 2022-05-06 四川大学 一种基于注意力掩模融合的全卷积目标检测方法
CN110287846B (zh) * 2019-06-19 2023-08-04 南京云智控产业技术研究院有限公司 一种基于注意力机制的人脸关键点检测方法
CN111192277A (zh) * 2019-12-31 2020-05-22 华为技术有限公司 一种实例分割的方法及装置
CN111461213B (zh) * 2020-03-31 2023-06-02 华中科技大学 一种目标检测模型的训练方法、目标快速检测方法
CN112307978B (zh) * 2020-10-30 2022-05-24 腾讯科技(深圳)有限公司 目标检测方法、装置、电子设备及可读存储介质
CN112241731B (zh) * 2020-12-03 2021-03-16 北京沃东天骏信息技术有限公司 一种姿态确定方法、装置、设备及存储介质
CN112464851A (zh) * 2020-12-08 2021-03-09 国网陕西省电力公司电力科学研究院 一种基于视觉感知的智能电网异物入侵检测方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130204A1 (en) * 2017-10-31 2019-05-02 The University Of Florida Research Foundation, Incorporated Apparatus and method for detecting scene text in an image
CN109558844A (zh) * 2018-11-30 2019-04-02 厦门商集网络科技有限责任公司 基于图像归一化提升自定义模板识别率的方法及设备
CN111680544A (zh) * 2020-04-24 2020-09-18 北京迈格威科技有限公司 人脸识别方法、装置、系统、设备及介质
CN112116074A (zh) * 2020-09-18 2020-12-22 西北工业大学 一种基于二维空间编码的图像描述方法
CN113255700A (zh) * 2021-06-10 2021-08-13 展讯通信(上海)有限公司 图像的特征图的处理方法及装置、存储介质、终端

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
QIN SHUJING, YANG GUAN: "Research on Visual Question Answering Task with Enhanced Visual Feature", ZHONGYUAN-GONGXUEYUAN-XUEBAO = JOURNAL OF ZHONGYUAN UNIVERSITY OF TECHNOLOGY, ZHENGZHOU : GONGXUEYUAN, CN, vol. 31, no. 1, 29 February 2020 (2020-02-29), CN , XP093014504, ISSN: 1671-6906, DOI: 10.3969/j.issn.1671-6906.2020.01.011 *

Also Published As

Publication number Publication date
CN113255700B (zh) 2021-11-02
CN113255700A (zh) 2021-08-13

Similar Documents

Publication Publication Date Title
CN109389030B (zh) 人脸特征点检测方法、装置、计算机设备及存储介质
WO2020238560A1 (fr) Procédé et appareil de suivi de cible vidéo, dispositif informatique et support de stockage
WO2021017261A1 (fr) Procédé et appareil d'entraînement de modèles de reconnaissance, procédé et appareil de reconnaissance d'images, et dispositif et support
CN110489951B (zh) 风险识别的方法、装置、计算机设备和存储介质
WO2020248581A1 (fr) Procédé et appareil d'identification de données de graphe, dispositif informatique et support d'informations
WO2022105125A1 (fr) Procédé et appareil de segmentation d'image, dispositif informatique et support de stockage
CN109960742B (zh) 局部信息的搜索方法及装置
CN111860398B (zh) 遥感图像目标检测方法、系统及终端设备
WO2020248841A1 (fr) Procédé et appareil de détection d'au pour une image et dispositif électronique et support d'informations
CN110414344B (zh) 一种基于视频的人物分类方法、智能终端及存储介质
CN113343982B (zh) 多模态特征融合的实体关系提取方法、装置和设备
WO2021012382A1 (fr) Procédé et appareil de configuration d'agent conversationnel, dispositif informatique et support de stockage
CN110765860A (zh) 摔倒判定方法、装置、计算机设备及存储介质
CN112102340B (zh) 图像处理方法、装置、电子设备和计算机可读存储介质
CN112308866B (zh) 图像处理方法、装置、电子设备及存储介质
CN111160288A (zh) 手势关键点检测方法、装置、计算机设备和存储介质
CN113435594B (zh) 安防检测模型训练方法、装置、设备及存储介质
CN108764143B (zh) 图像处理方法、装置、计算机设备和存储介质
CN110427915B (zh) 用于输出信息的方法和装置
CN111383232A (zh) 抠图方法、装置、终端设备及计算机可读存储介质
CN109598298B (zh) 图像物体识别方法和系统
WO2023050651A1 (fr) Procédé et appareil de segmentation d'image sémantique, dispositif, et support de stockage
CN111062324A (zh) 人脸检测方法、装置、计算机设备和存储介质
CN111783797B (zh) 目标检测方法、装置及存储介质
JP2022064808A (ja) 画像認識方法および画像認識システム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21944929

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21944929

Country of ref document: EP

Kind code of ref document: A1