CN116403064A - Picture processing method, model, basic block structure, device and medium - Google Patents

Picture processing method, model, basic block structure, device and medium Download PDF

Info

Publication number
CN116403064A
CN116403064A CN202310668217.9A CN202310668217A CN116403064A CN 116403064 A CN116403064 A CN 116403064A CN 202310668217 A CN202310668217 A CN 202310668217A CN 116403064 A CN116403064 A CN 116403064A
Authority
CN
China
Prior art keywords
channel
branch
feature
weight value
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310668217.9A
Other languages
Chinese (zh)
Other versions
CN116403064B (en
Inventor
王立
范宝余
郭振华
李仁刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202310668217.9A priority Critical patent/CN116403064B/en
Publication of CN116403064A publication Critical patent/CN116403064A/en
Application granted granted Critical
Publication of CN116403064B publication Critical patent/CN116403064B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of computer vision and discloses a picture processing method, a model, a basic block structure, equipment and a medium. The method comprises the following steps: acquiring n branch characteristics extracted from a target picture under n branches, wherein n is a positive integer; calculating branch characteristic channel weight value vectors corresponding to all branches, wherein each branch characteristic channel weight value vector is used for representing the channel importance degree corresponding to all channels corresponding to the branch characteristics; carrying out channel-level feature weighting on the corresponding branch features by using the branch feature channel weight value vector to obtain n channel-weighted branch features; and fusing the weighted branch characteristics of the n channels to obtain the output characteristics of the target picture. Based on the technical scheme provided by the invention, the expression capacity of the channel with improved performance in the network can be enhanced, and meanwhile, the expression capacity of the channel with little influence on the final result is restrained, so that the processing effect of the picture is improved.

Description

Picture processing method, model, basic block structure, device and medium
Technical Field
The invention relates to the field of computer vision, in particular to a picture processing method, a model, a basic block structure, equipment and a medium.
Background
In recent years, deep learning has been widely used in the field of computer vision for solving the problems of picture classification, image segmentation, object detection, and the like.
When learning pictures by using a deep learning model, the learning ability of the model is often enhanced by deepening or widening the direction of the network, but too many picture features are inevitably introduced by this method.
In the face of a large number of picture features, how to pick out the picture features with discrimination, there is a need to provide a solution.
Disclosure of Invention
In view of this, the present invention provides a picture processing method, model, basic block structure, device and medium, and designs an attention mechanism to enhance the processing performance of pictures by weighting the channels of the picture features.
In a first aspect, the present invention provides a method for processing a picture, the method including:
acquiring n branch characteristics extracted from a target picture under n branches, wherein n is a positive integer;
calculating branch characteristic channel weight value vectors corresponding to all branches, wherein each branch characteristic channel weight value vector is used for representing the channel importance degree corresponding to all channels corresponding to the branch characteristics;
Carrying out channel-level feature weighting on the corresponding branch features by using the branch feature channel weight value vector to obtain n channel-weighted branch features;
and fusing the n channel weighted branch characteristics to obtain the output characteristics of the target picture.
In a second aspect, the present invention provides an attention model comprising:
the input module is used for acquiring n branch characteristics extracted from the target picture under n branches, wherein n is a positive integer;
the channel weight value calculation module is used for calculating a branch characteristic channel weight value vector corresponding to each branch, and each branch characteristic channel weight value vector is used for representing the channel importance degree corresponding to each channel of the corresponding branch characteristic;
the feature weighting module is used for carrying out feature weighting on channel levels on the corresponding branch features by using the branch feature channel weight value vector to obtain n channel weighted branch features;
and the output module is used for fusing the n channel weighted branch characteristics to obtain the output characteristics of the target picture.
In a third aspect, the present invention provides a first basic block structure, where the first basic block structure includes: an attention model, addition module as described in the above aspects;
The attention model is used for outputting the output characteristics of the target picture under the condition that the branch characteristics of the target picture extracted under n branches are input, wherein n is a positive integer;
the adding module is configured to add the output feature of the target picture and the target picture to obtain an output result of the first basic block structure for the target picture.
In a fourth aspect, the present invention provides a second basic block structure, where the second basic block structure includes: an attention model, addition module as described in the above aspects;
the attention model is used for outputting the output characteristics of the target picture under the condition that the branch characteristics of the target picture extracted under n branches are input, wherein n is a positive integer;
the adding module is configured to add the output characteristics of the target picture and the results of the target picture after batch normalization, so as to obtain an output result of the second basic block structure for the target picture.
In a fifth aspect, the present invention provides a computer device comprising: the image processing device comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions, so that the image processing method of the first aspect or any corresponding implementation mode of the first aspect is executed.
In a sixth aspect, the present invention provides a computer-readable storage medium having stored thereon computer instructions for causing a computer to execute the picture processing method of the first aspect or any one of the embodiments corresponding thereto.
The technical scheme provided by the embodiment of the invention at least has the following beneficial effects:
under the condition that n branch characteristics under n branches are extracted from a target picture, branch characteristic channel weight value vectors are calculated for all branches respectively, the branch characteristic channel weight value vectors are used for representing channel importance degrees corresponding to all channels corresponding to the branch characteristics, the branch characteristic channel weight value vectors are used for carrying out channel-level characteristic weighting on the branch characteristics to obtain n channel weighted branch characteristics, the n channel weighted branch characteristics are added to obtain output characteristics of the target picture, so that channels with different branch characteristics are weighted based on an attention mechanism, the expression capacity of channels with improved performance is enhanced, the expression capacity of channels with little influence on the performance is restrained, the performance of output characteristics obtained based on the channel weighted branch characteristics is finally improved, and the processing effect of tasks can be guaranteed when the output characteristics are applied to computer vision tasks such as classification.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a picture processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another picture processing method according to an embodiment of the invention;
FIG. 3 is a flow chart of another picture processing method according to an embodiment of the invention;
FIG. 4 is a schematic diagram of the structure of an attention model according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a picture processing procedure according to an embodiment of the present invention;
fig. 6 is a schematic structural view of a first basic block structure according to an embodiment of the present invention;
fig. 7 is a schematic structural view of a second basic block structure according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a classification network according to an embodiment of the invention;
fig. 9 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
First, the concept of the attention mechanism involved in the present invention will be briefly described:
deep learning has been largely successful in solving the problems in the computer vision field of picture classification, image segmentation, object detection, and the like. In recent years, many excellent deep learning models have emerged.
In recent years, attention Model (Attention Model) is widely used in various different types of deep learning tasks such as natural language processing, image recognition and speech recognition, and is one of the core technologies most worthy of Attention and deep understanding among deep learning technologies, so Attention mechanism (Attention mechanism) -based deep learning Model has received a great deal of Attention in recent years, and is an important research direction.
In human vision, the attention mechanism appears as a human vision attention mechanism, which is a brain signal processing mechanism specific to human vision. The human vision obtains a target area needing to be focused, namely a focus of attention, through rapidly scanning the global image, and then inputs more attention resources into the area so as to acquire more detail information of the target needing to be focused, and other useless information is restrained. The method is a means for quickly screening high-value information from a large amount of information by using limited attention resources, is a survival mechanism formed by human in long-term evolution, and greatly improves the efficiency and accuracy of visual information processing by using a human visual attention mechanism.
For deep learning, attention mechanism is designed to focus attention on important points, and other unimportant factors are ignored. Wherein the judgment of the importance degree depends on different network structures or application scenes.
With the continuous development of deep learning technology, deep learning models are layered endlessly, but in order to further improve the accuracy, researchers tend to design in a direction of deepening or widening a network. It is undeniable that as the network gets deeper or wider, the learning ability of the model is also continuously enhanced, but the calculation amount and the parameter amount of the model are also rapidly increased, which is not beneficial to deployment in practical application, and meanwhile, as the layer number of the model is increased, a great amount of noise (i.e. a great number of useless features) is inevitably introduced, and the excessive features generally not only do not enhance the ability of the network model, but also confuse the classifier, thereby reducing the recognition ability of the network.
Therefore, only a limited number of features with discrimination can be selected to achieve good discrimination and maximum model exertion. Whereas the attention mechanism (attention mechanism) exhibits great advantages in feature selection capability and is thus widely adopted.
Based on this, the embodiment of the invention provides a picture processing method, which aims to design a more excellent structure of an attention model, so as to enhance the expression capability of channels with improved performance in a network, and inhibit the expression capability of channels with little influence on a final result, thereby further improving the picture processing effect.
According to an embodiment of the present invention, there is provided a picture processing method, it should be noted that the steps shown in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order different from that herein.
In this embodiment, a picture processing method is provided, which may be used in a computer device, and fig. 1 is a flowchart of a picture processing method according to an embodiment of the present invention, as shown in fig. 1, where the flowchart includes the following steps:
Step S101, n branch characteristics extracted from a target picture under n branches are acquired, wherein n is a positive integer.
The target picture is a picture which needs to be subjected to feature identification and is selected to output features. The embodiment of the invention does not limit the specific content of the target picture, the specific application scene of the finally selected output characteristic and the like. For example: the output characteristics are applied to the pedestrian re-recognition task, and in the pedestrian re-recognition task, the pedestrian characteristic expression and the screening of the characteristic with the discrimination directly determine whether the target pedestrian can be correctly recognized or not, so that the pedestrian re-recognition task is an important link of the pedestrian re-recognition task.
In this embodiment, feature extraction is performed on the target picture under n branches, so as to obtain n branch features under n branches, where n is a positive integer.
Each branch is an independent calculation module containing multiple convolution layers, and the convolution kernel size, the number of the convolution layers and the like of each branch are not identical, so that the branch characteristics extracted under different branches can have different sensing fields.
Wherein n may be further limited to a positive integer greater than 1, that is, extracting a branch feature under multiple branches, and applying the picture processing method based on the attention mechanism provided by the embodiment of the present invention to a scene with multiple branches. It can be understood that the image processing method based on the attention mechanism provided by the embodiment of the invention can be applied to a single-branch scene, and the performance improving effect of the technical scheme provided by the embodiment of the invention is more obvious only in a multi-branch scene. In the following embodiments, a multi-branch scenario is mainly described.
Step S102, calculating branch characteristic channel weight value vectors corresponding to the branches, wherein each branch characteristic channel weight value vector is used for representing the channel importance degree corresponding to the channels of the corresponding branch characteristics.
Wherein a channel is one dimension for describing image features, and a branching feature under each branching may be provided with a plurality of different channels.
In this embodiment, n branch feature channel weight value vectors corresponding to n branches are calculated, and each branch feature channel weight value vector is used to represent a channel importance level corresponding to each channel corresponding to a branch feature.
Illustratively, there are 2 branches: branch 1 and branch 2, and calculating 2 branch characteristic channel weight value vectors corresponding to the 2 branches respectively: branch characteristic channel weight value vector 1, branch characteristic channel weight value vector 2. The branch characteristic channel weight value vector 1 is used for representing the channel importance degree corresponding to each channel of the branch characteristic under the branch 1; the branch characteristic channel weight value vector 2 is used for representing the channel importance degree corresponding to each channel of the branch characteristic under the branch 2.
Step S103, using the branch characteristic channel weight value vector to carry out channel-level characteristic weighting on the corresponding branch characteristics to obtain n channel-weighted branch characteristics.
In this embodiment, after n branch feature channel weight value vectors are calculated, the feature weighting of the channel level is performed on the corresponding branch feature by using the branch feature channel weight value vectors, so as to obtain n channel weighted branch features.
Illustratively, there are 2 branches: branches 1 and 2, and carrying out channel-level feature weighting on the branch features 1 under the branches 1 by using the branch feature channel weight value vector 1 to obtain channel-weighted branch features 1; and carrying out channel-level feature weighting on the branch features 2 under the branch 2 by using the branch feature channel weight value vector 2 to obtain the channel-weighted branch features 2.
And step S104, fusing the weighted branch characteristics of the n channels to obtain the output characteristics of the target picture.
In this embodiment, all the channel weighted branch features are fused, so as to obtain the output features of the final target picture.
One way to fuse the n channel weighted branch features may be to add the n channel weighted branch features.
In summary, in the image processing method provided in this embodiment, under the condition that n branch features under n branches are extracted from a target image, a branch feature channel weight value vector is calculated for each branch, where the branch feature channel weight value vector is used to characterize the channel importance degree corresponding to each channel corresponding to the branch feature, the branch feature channel weight value vector is used to weight the feature of the channel level to obtain n channel weighted branch features, and the n channel weighted branch features are added to obtain the output features of the target image, so that channels with different branch features are weighted based on an attention mechanism, the expression capability of channels with improved performance is enhanced, and meanwhile, the expression capability of channels with little influence on performance is suppressed, and finally, the performance of the output features obtained based on the channel weighted branch features is improved.
In this embodiment, a picture processing method is provided, which may be used in a computer device, and fig. 2 is a flowchart of a picture processing method according to an embodiment of the present invention, as shown in fig. 2, where the flowchart includes the following steps:
step S201, obtaining branch characteristics extracted from a target picture under n branches, wherein n is a positive integer.
Please refer to step S101 in the embodiment shown in fig. 1 in detail, which is not described herein.
Step S202, carrying out feature compression on the n branch features to obtain compressed features.
The compression feature is a feature obtained by compressing n branch features. It will be appreciated that in subsequent processing steps, the compression feature is less computationally intensive than the n individual branch features.
Step S203, based on the compression characteristics, calculating a compression characteristic channel weight value vector, wherein the compression characteristic channel weight value vector is used for representing the channel importance degree corresponding to each channel of the compression characteristics.
In this embodiment, after the compression features corresponding to the n branch features are obtained, compression feature channel weight value vectors that characterize the channel importance degrees corresponding to the respective channels of the compression features are calculated.
Illustratively, the compression feature has a total of 3 channels: channel 1, channel 2, channel 3, then a compressed feature channel weight vector is calculated that can characterize the channel importance of channel 1, the channel importance of channel 2, the channel importance of channel 3.
In an alternative embodiment, the process of calculating the compression characteristics includes: adding the n branch features to obtain fusion features corresponding to the n branch features; or connecting the n branch features based on the channel dimension to obtain fusion features corresponding to the n branch features.
In this embodiment, n branch features may be compressed in a fusion manner, specifically, n branch features may be directly added, or n branch features may be connected based on channel dimensions, so as to ensure that n branch features are effectively compressed.
In an alternative embodiment, in a case where the number of channels of the compression feature is m, and m is a positive integer, calculating the compression feature channel weight value vector based on the compression feature includes: splitting the compression characteristics at channel level to obtain m channel characteristic diagrams; calculating statistic information of each channel feature map under various statistic values to obtain m channel statistic vectors; and calculating to obtain a compression characteristic channel weight value vector based on the m channel statistic vectors.
Wherein, the statistic information under the multi-statistic is: and from the angle of various statistics, the channel characteristic diagram is counted, and the obtained statistics information contains corresponding information of various statistics.
In this embodiment, the compression feature is split according to the dimensions of the channel, and then the statistic information of each channel feature map under multiple statistics is calculated, so that the obtained channel statistic vector can be used for reflecting the characteristics of the channel feature map, and then the weight vector of the compression feature channel is calculated by using each channel statistic vector, so that the accurate calculation of the weight vector of the compression feature channel is completed in a manner of estimating the importance of the channel by multiple probabilities in the branches.
In an alternative embodiment, the calculating the compressed feature channel weight vector based on the m channel statistic vectors includes: inputting the m channel statistic vectors into at least one full-connection layer respectively, and outputting to obtain channel importance degrees corresponding to the m channels respectively; and arranging the channel importance degrees corresponding to the m channels respectively according to the channels to obtain a compression characteristic channel weight value vector.
In this embodiment, the channel statistics under the multiple statistics are learned by at least one full connection layer, so as to obtain the channel importance degrees corresponding to the channels respectively, and the channel importance degrees of all the channels are compressed characteristic channel weight value vectors after being arranged according to the channels, so that the channel importance degrees can be accurately determined according to the multiple statistics under the design of the multiple statistics.
In an alternative embodiment, the statistics include at least one of: mean, variance, coefficient of variation, skewness, peak, maximum, minimum, median, quartile.
In the present embodiment, the statistics of different angles comprehensively reflect the data distribution characteristics of each channel feature map, and the statistics information of multiple statistics can reflect the channel importance more finely than the manner of determining the channel importance according to a single statistic.
Step S204, calculating branch characteristic channel weight value vectors corresponding to the branches based on the compressed characteristic channel weight value vectors.
In this embodiment, after the compression feature channel weight value vector is calculated, based on this information, the channel importance degrees corresponding to the channels of the branch feature are calculated, and n branch feature channel weight value vectors are obtained.
Step S205, using the branch feature channel weight vector, weighting the feature of the channel level for the corresponding branch feature to obtain n channel weighted branch features.
Please refer to step S103 in the embodiment shown in fig. 1 in detail, which is not described herein.
And S206, fusing the weighted branch characteristics of the n channels to obtain the output characteristics of the target picture.
Please refer to step S104 in the embodiment shown in fig. 1 in detail, which is not described herein.
In summary, in the image processing method based on the attention mechanism provided in this embodiment, the fused feature channel weight vector representing the channel importance degree corresponding to each channel of the fused feature is calculated first, and then the branch feature channel weight vector corresponding to each branch is calculated based on the fused feature channel weight vector, so that the channel importance degree is accurately estimated from the fine granularity feature comparison of the fused feature.
In this embodiment, a picture processing method is provided, which may be used in a computer device, fig. 3 is a flowchart of a picture processing method according to an embodiment of the present invention, and as shown in fig. 3, the foregoing flowchart step S204 may alternatively be implemented as the following steps:
step S301, based on the compressed feature channel weight vector, channel-level feature weighting is performed on the compressed features, and weighted compressed features are obtained.
In this embodiment, according to the channel correspondence between the compressed feature channel weight vector and the compressed feature, the feature weighting of the channel level is performed, so as to obtain the weighted compressed feature.
In an alternative embodiment, the weighted compression characteristics further include the following weighting process: and carrying out feature weighting on the local spatial position on the compressed features subjected to the feature weighting of the channel level to obtain weighted compressed features.
In this embodiment, after the first weighting of the channel level is performed on the compressed feature, the second weighting of the local spatial position is performed, and the subsequent calculation is performed by using the compressed feature after the two weights, so that the performance of the compressed feature is further improved by increasing the feature weighting of the local spatial position of the second weight.
In an alternative embodiment, the feature weighting of the local spatial locations is performed on the compressed feature that completes the feature weighting of the channel level, resulting in a weighted compressed feature, including:
(1) For any spatial position pixel in the compressed features for which the feature weighting of the channel level is completed, calculating a local neighbor relation vector corresponding to the spatial position pixel, wherein the local neighbor relation vector is used for representing the correlation between the spatial position pixel and a neighborhood spatial position pixel, and the neighborhood spatial position pixel is a spatial position pixel around the spatial position pixel.
In this embodiment, after the feature weighting of the first heavy channel level is completed, the correlation between each spatial position pixel in the compressed feature and the surrounding spatial position pixels is calculated, and the local neighbor relation vector corresponding to each spatial position pixel is obtained.
In an alternative embodiment, for any one spatial location pixel in the compressed feature that completes the feature weighting at the channel level, a local neighbor relation vector corresponding to the spatial location pixel is calculated, including: for a target space position pixel, respectively calculating the correlation between the target space position pixel and k neighborhood space position pixels to obtain k local neighborhood relation scalar quantities of the target space position pixel, wherein k is a positive integer; and splicing the k local neighbor relation scalar quantities of the target spatial position pixels to obtain the local neighbor relation vector corresponding to the target spatial position pixels.
Wherein the target spatial location pixel is any one of the spatial location pixels in the compressed feature.
In this embodiment, the calculation process of the local neighbor relation vector of the target spatial position pixel is referred to as follows: firstly, determining k space position pixels around the target space position pixel as k neighborhood space position pixels, respectively calculating the correlation between the target space position pixel and each neighborhood space position pixel to obtain k local neighbor relation scalar quantities, and then splicing the k local neighbor relation scalar quantities according to columns to obtain the local neighbor relation vector of the target space position pixel, thereby accurately generating the local neighbor relation vector corresponding to the space position pixel.
In an alternative embodiment, before calculating the local neighbor relation vector corresponding to the spatial location pixel, the method further includes: selecting a target local neighborhood relation from a plurality of local neighborhood relations for the spatial position pixels through a local neighborhood selection network, wherein the local neighborhood relation is used for defining the relation between the spatial position pixels and the neighborhood spatial position pixels; and taking the compression characteristic meeting the selected target local neighborhood relation with the spatial position pixel as a neighborhood spatial position pixel of the spatial position pixel.
The local neighborhood selection network is a neural network with a local neighborhood relation matching function.
In this embodiment, at least one local neighborhood relation is predefined, a local neighborhood selection network is first used to match a target local neighborhood relation for a spatial location pixel, and then a neighborhood spatial location pixel corresponding to the spatial location pixel is determined through the target local neighborhood relation, so that a reasonable neighborhood spatial location pixel is selected for each spatial location pixel, and the selection of the neighborhood spatial location pixel is adaptively variable.
(2) And normalizing the local neighbor relation vector corresponding to each spatial position pixel to obtain the normalized local neighbor relation vector corresponding to each spatial position pixel.
In this embodiment, for each spatial location pixel in the compression feature, the correlation between the spatial location pixel and the spatial location pixel in different neighborhoods indicated in the local neighbor relation vector is normalized, and finally, the normalized local neighbor relation vector corresponding to each spatial location pixel is obtained.
Wherein the normalization process can be implemented by a Softmax function.
(3) And carrying out feature weighting on the corresponding spatial position pixels by using the normalized local neighbor relation vector to obtain weighted compression features.
In this embodiment, for each spatial location pixel in the compressed feature, feature weighting is performed using its corresponding normalized local neighbor relation vector, and after feature weighting for all spatial location pixels is completed, a weighted compressed feature is obtained.
In an alternative embodiment, feature weighting is performed on corresponding spatial location pixels by using normalized local neighbor relation vectors, so as to obtain weighted compression features, including: extracting vectors of neighborhood space position pixels of the target space position pixels in channel dimensions aiming at the target space position pixels to form local area features of the target space position pixels; carrying out feature weighting on the local neighbor relation vector corresponding to the target spatial position pixel and the local area feature of the target spatial position pixel to obtain the local area weighted feature of the target spatial position pixel; and splicing the local weighting characteristics of all the spatial position pixels to obtain weighted compression characteristics.
Wherein the target spatial location pixel is any one of the spatial location pixels in the compressed feature.
In the present embodiment, the calculation process of performing feature weighting on the target spatial position pixel using the normalized local neighbor relation vector is referred to as follows: for k neighborhood space position pixels around the target space position pixel, extracting vectors of the k neighborhood space position pixels according to the dimension of the channel to obtain extracted local area characteristics, and performing matrix multiplication on the extracted local area characteristics by using the normalized local neighbor relation vector corresponding to the target space position pixel and the local area characteristics corresponding to the target space position pixel to realize accurate weighted fusion.
Step S302, downsampling the weighted compression characteristics in the dimensions of height and width to obtain downsampled compression characteristics.
In this embodiment, the dimension of the weighted compressed feature is height×width×channel, and the height×width downsampling is performed to obtain a downsampled compressed feature with 1×channel dimension, so as to facilitate subsequent processing in the channel dimension.
In step S303, the downsampled compressed features are split into n groups of split features, each group of split features corresponding to a branch.
In this embodiment, the downsampled compressed features are split into n groups on average, and the split features in each group may be used to train on the channel importance of the corresponding branch.
And S304, respectively carrying out feature reduction on the n groups of split features to obtain n branch feature channel weight value vectors corresponding to the n branches respectively.
In this embodiment, since the split feature is split, the scale of the split feature cannot be used to characterize the channel importance degree corresponding to each channel of the branch feature, so that feature reduction is performed on n groups of split features, and the n groups of split features are restored to the scale matched with the branch feature channel weight value vector, so as to obtain the branch feature channel weight value vector under the corresponding branch.
In an optional implementation manner, feature reduction is performed on n groups of split features to obtain n branch feature channel weight value vectors corresponding to n branches respectively, where the feature reduction includes: respectively inputting n groups of split features into at least one full-connection layer, and outputting to obtain n prepared branch feature channel weight value vectors; and carrying out comparison weighting processing among branches on the n prepared branch characteristic channel weight value vectors to obtain n branch characteristic channel weight value vectors.
In this embodiment, the split feature is learned by at least one full-connection layer, the learned information is used as a weight value vector of a preliminary branch feature channel, the comparison weighting processing between branches is further performed on the n preliminary branch feature channel weight value vectors, and the obtained information is used as a final weight value vector of a branch feature channel, so that the fine-grained feature comparison between branch features is increased, and the channel importance degree is accurately estimated in an omnibearing manner.
In an alternative embodiment, in the case that the number of channels of the compression feature is m, where m is a positive integer, comparing and weighting processing is performed between branches on n prepared branch feature channel weight value vectors to obtain n branch feature channel weight value vectors, where the processing includes: respectively extracting channel weight values of n prepared branch characteristic channel weight value vectors on the same channel to obtain m recombined channel weight value vectors corresponding to m channels respectively; respectively normalizing each recombined channel weight value vector to obtain m normalized recombined channel weight value vectors; and performing feature replacement on the n prepared branch feature channel weight value vectors by using the normalized m recombined channel weight value vectors to obtain n branch feature channel weight value vectors.
In this embodiment, elements corresponding to the same channel are extracted from all the preliminary branch feature channel weight value vectors, m post-recombination channel weight value vectors are obtained after the extraction of m channels is completed, the m post-recombination channel weight value vectors are normalized, and the normalized m post-recombination channel weight value vectors are restored to original positions in the n preliminary branch feature channel weight value vectors, so that the inter-branch contrast feature weighting is realized through the transverse contrast of the preliminary branch feature channel weight value vectors among branches.
In summary, in the image processing method based on the attention mechanism provided in this embodiment, based on the compressed feature channel weight vector, the feature weighting of the channel level is performed on the compressed feature to obtain the weighted compressed feature, then the downsampling is performed on the weighted compressed feature in the dimensions of height and width to obtain the downsampled compressed feature, the downsampled compressed feature is split into n groups of split features, the n groups of split features are restored to obtain n branch feature channel weight vectors corresponding to the n branches respectively, so that the channel importance degree of the channel in each branch is obtained through the compression and restoration of the feature.
It will be appreciated that the above method embodiments may be implemented alone or in combination, and the invention is not limited in this regard.
In this embodiment, an attention model is further provided, and the attention model is used to implement the foregoing embodiments and preferred implementations, and is not described in detail.
The present embodiment provides an attention model, as shown in fig. 4, including:
an input module 401, configured to obtain n branch features extracted from the target picture under n branches, where n is a positive integer;
a channel weight calculation module 402, configured to calculate a branch feature channel weight vector corresponding to each branch, where each branch feature channel weight vector is used to characterize a channel importance level corresponding to each channel corresponding to a branch feature;
the feature weighting module 403 is configured to perform feature weighting on the channel level for the corresponding branch feature by using the branch feature channel weight vector, so as to obtain n channel weighted branch features;
and the output module 404 is configured to fuse the n channel weighted branch features to obtain an output feature of the target picture.
In some alternative embodiments, the channel weight calculation module 402 includes:
The feature compression unit is used for carrying out feature compression on the n branch features to obtain compression features;
the compression characteristic channel weight value calculation unit is used for calculating a compression characteristic channel weight value vector based on the compression characteristic, and the compression characteristic channel weight value vector is used for representing the channel importance degree corresponding to each channel of the compression characteristic;
and the branch characteristic channel weight value calculating unit is used for calculating branch characteristic channel weight value vectors corresponding to all branches based on the compressed characteristic channel weight value vectors.
In some optional embodiments, the branch feature channel weight value calculation unit includes:
the weighting calculation subunit is used for carrying out channel-level feature weighting on the compression features based on the compression feature channel weight value vector to obtain weighted fusion features;
a downsampling subunit, configured to downsample the weighted compression feature in the dimensions of height and width, to obtain a downsampled compression feature;
a feature splitting subunit, configured to split the downsampled compressed feature into n groups of split features, each group of split features corresponding to a branch;
and the characteristic reduction subunit is used for respectively carrying out characteristic reduction on the n groups of split characteristics to obtain n branch characteristic channel weight value vectors corresponding to the n branches respectively.
In some alternative embodiments, the feature reduction subunit is configured to:
respectively inputting n groups of split features into at least one full-connection layer, and outputting to obtain n prepared branch feature channel weight value vectors;
and carrying out comparison weighting processing among branches on the n prepared branch characteristic channel weight value vectors to obtain n branch characteristic channel weight value vectors.
In some alternative embodiments, where the number of channels of the compression feature is m, m being a positive integer, the feature reduction subunit is configured to:
respectively extracting channel weight values of n prepared branch characteristic channel weight value vectors on the same channel to obtain m recombined channel weight value vectors corresponding to m channels respectively;
respectively normalizing each recombined channel weight value vector to obtain m normalized recombined channel weight value vectors;
and performing feature replacement on the n prepared branch feature channel weight value vectors by using the normalized m recombined channel weight value vectors to obtain n branch feature channel weight value vectors.
In some alternative embodiments, the weighting calculation subunit is configured to:
and carrying out feature weighting on the local spatial position on the compressed features subjected to the feature weighting of the channel level to obtain weighted compressed features.
In some alternative embodiments, the weighting calculation subunit is configured to:
for any spatial position pixel in the compressed features for completing the feature weighting of the channel level, calculating a local neighbor relation vector corresponding to the spatial position pixel, wherein the local neighbor relation vector is used for representing the correlation between the spatial position pixel and a neighborhood spatial position pixel, and the neighborhood spatial position pixel is a spatial position pixel around the spatial position pixel;
normalizing the local neighbor relation vector corresponding to each spatial position pixel to obtain a normalized local neighbor relation vector corresponding to each spatial position pixel;
and carrying out feature weighting on the corresponding spatial position pixels by using the normalized local neighbor relation vector to obtain weighted compression features.
In some alternative embodiments, the weighting calculation subunit is configured to:
selecting a target local neighborhood relation from a plurality of local neighborhood relations for the spatial position pixels through a local neighborhood selection network, wherein the local neighborhood relation is used for defining the relation between the spatial position pixels and the neighborhood spatial position pixels;
and taking the compression characteristic meeting the selected target local neighborhood relation with the spatial position pixel as a neighborhood spatial position pixel of the spatial position pixel.
In some alternative embodiments, the weighting calculation subunit is configured to:
for a target space position pixel, respectively calculating the correlation between the target space position pixel and k neighborhood space position pixels to obtain k local neighborhood relation scalar quantities of the target space position pixel, wherein k is a positive integer;
and splicing the k local neighbor relation scalar quantities of the target spatial position pixels to obtain the local neighbor relation vector corresponding to the target spatial position pixels.
In some alternative embodiments, the weighting calculation subunit is configured to:
extracting vectors of neighborhood space position pixels of the target space position pixels in channel dimensions aiming at the target space position pixels to form local area features of the target space position pixels;
carrying out feature weighting on the local neighbor relation vector corresponding to the target spatial position pixel and the local area feature of the target spatial position pixel to obtain the local area weighted feature of the target spatial position pixel;
and splicing the local weighting characteristics of all the spatial position pixels to obtain weighted compression characteristics.
In some optional embodiments, in a case where the number of channels of the compression feature is m, where m is a positive integer, the compression feature channel weight value calculation unit includes:
The channel splitting subunit is used for splitting the compression characteristics at the channel level to obtain m channel characteristic diagrams;
the multi-statistic statistics subunit is used for calculating statistic information of each channel feature map under various statistic to obtain m channel statistic vectors;
and the fusion characteristic channel weight value calculating subunit is used for calculating and obtaining a compression characteristic channel weight value vector based on the m channel statistic vectors.
In some alternative embodiments, the compression characteristic channel weight value calculation subunit is configured to:
inputting the m channel statistic vectors into at least one full-connection layer respectively, and outputting to obtain channel importance degrees corresponding to the m channels respectively;
and arranging the channel importance degrees corresponding to the m channels respectively according to the channels to obtain a compression characteristic channel weight value vector.
In some alternative embodiments, the statistics include at least one of:
mean, variance, coefficient of variation, skewness, peak, maximum, minimum, median, and quartile.
In some optional embodiments, the channel weight calculation module 402 further includes a feature fusion unit, configured to:
adding the n branch features to obtain fusion features corresponding to the n branch features;
Or alternatively, the first and second heat exchangers may be,
and connecting the n branch features based on the channel dimension to obtain fusion features corresponding to the n branch features.
Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.
In connection with the picture processing method described in the above embodiments, the procedure performed in the attention model can be divided into four phases: 1. compressing the characteristics; 2. splitting the characteristics; 3. screening characteristics; 4. and (5) weighting the characteristics.
Next, with reference to fig. 5, the above embodiment is exemplarily described.
(1) Extraction stage of multi-branch feature
The invention is not limited to the specific implementation form of the feature extraction module used.
It will be appreciated that the effect each branch performs during the feature extraction stage is different and the effect each branch typically performs is different, e.g. each branch provides features of different receptive fields, enabling richer features to be provided during the fusion stage. While increasing the number of branches (typically meaning widening the network) provides rich features, undoubtedly introducing a lot of noise, the redundant features in more cases do not only improve the performance of the network, but often also impair the classification capability of the classifier. Therefore, a mechanism needs to be designed to remove the redundant features, retain the features of optimal discrimination, and achieve this effect in the attention model in the following stage.
In fig. 5, it is assumed that each branch input feature map is c×h×w, and there are 4 branches in total.
(2) Feature compression stage
At this stage, the input multi-branch feature graphs are fused, and two kinds of fusion operations can be adopted: a) All feature maps are added; b) All feature graphs are connected together in the channel dimension (concat), and the fused features are denoted by F, whose dimension is c×h×w.
The number of the 4 branch characteristic diagram channels can be directly added when the number is the same, and the 4 branch characteristic diagram channels can be connected together in the channel dimension when the number is different.
(3) Feature splitting stage
The fused features contain a large amount of information, how to screen the most effective feature map channels from the information, and inhibit channels with little contribution to the final output result, and an effective channel selection mechanism, namely an attention mechanism, needs to be designed.
The dimension of the fused feature F is C×H×W, C represents the number of fused channels, H represents the fused height, and W represents the fused width. The ordinary attention mechanism obtains the average value of the characteristic diagram H multiplied by W of the corresponding channel by traversing each channel C to represent the importance of the channel, obtains a vector after obtaining the average value of all the C channels, learns the importance degree of the corresponding channel by carrying out calculation of a plurality of full connection layers on the vector, and finally carries out weighting.
The invention provides a multi-parameter probability statistics attention mechanism, which can truly realize the feature selection of multi-branch channel levels.
The first step is to split the fusion feature F according to the dimension of the channel C, and the second step is to make multi-parameter probability statistics for each channel, wherein the traditional attention mechanism uses the importance degree of the mean value statistical feature map, but the statistics of variance, variation coefficient, skewness and the like can more accurately reflect the characteristics of the channel feature map. The training of using the mean value singly as the channel importance is too general, and therefore, the present invention uses various statistics to calculate the importance of each channel feature map.
The following describes various statistics:
1. average value: for describing the average amount of data take on values.
Figure SMS_1
Wherein,,
Figure SMS_2
for the ith pixel value, M is the number of pixel values, < >>
Figure SMS_3
Is the average value.
2. Variance: for reflecting fluctuations and steady state of the data.
Figure SMS_4
Wherein,,
Figure SMS_5
for the ith pixel value, N is the number of pixel values, < >>
Figure SMS_6
For average value,/->
Figure SMS_7
Is the variance.
3. Coefficient of variation: the ratio of standard deviation to mean is called the coefficient of variation, a dimensionless quantity, and is used to characterize the relative dispersion of data.
Figure SMS_8
Wherein,,
Figure SMS_9
for average value,/->
Figure SMS_10
Is standard deviation (S)>
Figure SMS_11
Is the coefficient of variation.
4. Degree of deviation: to characterize data symmetry.
Figure SMS_12
Wherein,,
Figure SMS_13
for the ith pixel value, M is the number of pixel values, < >>
Figure SMS_14
For average value,/->
Figure SMS_15
Is deflection.
5. Kurtosis: for describing the steepness of the sample data distribution morphology with respect to the normal distribution.
Figure SMS_16
Wherein,,
Figure SMS_17
for the ith pixel value, M is the number of pixel values, < >>
Figure SMS_18
For average value,/->
Figure SMS_19
Kurtosis.
6. Maximum value (Maximum): for representing the maximum pixel value of the feature map.
7. Minimum (Minimum): for representing the minimum pixel value of the feature map.
8. Median (Median): is the median of the feature map pixel values and is used to reflect the central position of the feature map value distribution.
9. Quartiles (Quartiles): the pixel values of the feature images are arranged from small to large, the arranged values are divided into four equal parts, and the first, second and third quartiles are respectively the values at 25%, 50% and 75% positions, so that the dispersion degree of the numerical distribution of the feature images can be reflected.
These statistics can be obtained by calculating the values of each channel of the feature map and can be used to describe the numerical distribution of the feature map.
For each channel, calculating multi-statistic information of each channel and forming a new channel statistic vector P, wherein the index can comprehensively reflect the detailed distribution of the channel data and is used for comprehensively reflecting the importance degree of the channel.
The channel statistic vector P of the multi-statistic feature is learned through a full connection layer, and 1 number is output, wherein the number is used for representing the nonlinear combination of the multi-statistic feature and reflects the importance degree of the channel. The average value of each channel is directly calculated by a common attention mechanism to form a vector, the multi-statistic feature provided by the invention can reflect the importance degree information of the channel more finely, but the importance degree of the channel can be reflected by a plurality of statistics, and the importance degree of the channel can not be known, so that the nonlinear combination of the statistics can be dynamically learned based on a fully-connected learning mechanism, thereby learning the importance degree of the channel.
And traversing all channels, and finally obtaining the weight value learned by each channel through the operation. The weight values are arranged according to the channels to generate a fusion characteristic channel weight value vector V, and the dimension of V is C multiplied by 1.
And weighting the fusion characteristics by using a fusion characteristic channel weight value vector V, wherein the weighting method is to perform one-to-one corresponding multiplication according to the channel corresponding relation of V and the fused characteristics so as to perform weighting.
The multi-probability channel weighting in the completed branch above may be further locally spatially weighted below.
Firstly, acquiring the fusion characteristic weighted by the multi-probability channel in the last step. Processing continues below for this fusion feature.
1) The method comprises the steps of generating positions, establishing a local neighborhood selection network, and predefining local neighborhood relations of k shapes. For the fusion characteristics, a local neighborhood selection network can be designed, and local region selection of the fusion characteristics is realized through 2 times of global downsampling and 2 full connection layers.
2) Calculating a local neighbor relation vector: for each position corresponding fusion feature i, its correlation with the surrounding k positions is calculated. Assuming that the feature map has a size of hxwxc, k positions in the local area around the i position may be expressed as p_i= { j_1, j_2. For each position j_k, the eigenvectors of the i and j positions can be mapped into vectors f_i and f_j of d dimensions by two 1x1 convolutional layers. Then, the two vectors are spliced to obtain a 2 d-dimensional vector h_ { ij } = [ f_i, f_j ], and the 2 d-dimensional vector h_ { ij = [ f_i, f_j ] is input into a small neural network for processing, and a local neighbor relation scalar is generated for representing the correlation between the position i and the position j.
Generating a local neighbor relation vector: for each position i, the correlation w_ { ij } between i and the k surrounding positions is spliced according to columns, and a column vector of k x 1 is obtained as a local neighbor relation vector.
3) Normalization of the local neighbor relation vector w_ { ij }: for each position i, the correlation between k positions around it is normalized using the Softmax function, or a column vector of k 1 is obtained.
4) Feature weighting: weighting the local neighbor relation vector w_ { ij } and the input feature map at the position i to obtain a local area weighting feature of C dimension, wherein the weighting mode of each dimension is as follows:
traversing each position i of the fusion feature map, wherein k positions in a local area around the i position can be expressed as P_i= { j_1, j_2,..once, j_k }, extracting vectors of the k positions according to the dimension of a channel, wherein the dimension of the fusion feature map is H x W x C, and after the extraction is finished, obtaining extracted local area features T_i, and the dimension of the extracted local area features T_i is k x C. The matrix multiplication is carried out on the ith position space local neighbor relation vector w_ { ij } (the dimension of which is k x 1) and the local area characteristic T_i (the dimension of which is k x c), so as to realize weighted fusion, namely
Figure SMS_20
And obtaining a local area weighting characteristic of the position i, traversing all H-W positions with the dimension of 1*C, and completing local area weighting of all spatial positions.
Outputting a characteristic diagram: and splicing the local area weighting characteristics of all the positions into a tensor of H, W and C as an output characteristic diagram, wherein the output characteristic diagram is the fusion characteristic of multi-probability channel weighting and local space weighting in the finished branch.
Downsampling the height and width dimensions of the fusion features of multi-probability channel weighting and local spatial weighting in the processed branches, and flatteningAre divided into 4 parts, each part corresponds to 1 group and is respectively called as
Figure SMS_21
、/>
Figure SMS_22
、/>
Figure SMS_23
、/>
Figure SMS_24
The dimension of each group is C/4 dimension, and the split characteristic of each group is used next>
Figure SMS_25
The channel importance of each branch is trained.
(4) Feature screening stage
First according to split characteristics of each group
Figure SMS_26
Multiple branches are established, each independent of the other, for calculating the importance level, i.e. the attention mechanism, of the channels of the input feature map. Exemplary, each branch comprises 2 fully-connected layers, and the specific structure is as follows>Rectifying->Full connection layer->Rectifying, the structure being specific to the input feature>
Figure SMS_27
Learning is carried out to obtain the importance of the corresponding input characteristic diagram channel, and a weight value vector I of the prepared branch characteristic channel can be obtained through the operation i
For all I i And extracting elements at corresponding positions to reconstruct a new recombined channel weight value vector Vi. Extracting the element of the corresponding position means: give I i Is C×1, common I 1 To I N N feature graphs, corresponding to N branches, traverse I 1 To I N Respectively taking their first element (i.e. traversing all vectors I i Take the value of the first element) to form a new vector V i ,V i Is N x 1, in this example V i Is 4 x 1.
Each of the recombined vectors V is then described as i Normalization was performed by a softmax function. Finally, the normalized features are correspondingly replaced with the original feature vectors I i Finally, restoring the normalized features to the original positions to obtain a branch feature channel weight value vector W 1 To W N
(5) Feature weighting stage
Using trained and screened channel importance features W 1 To W N The input features are weighted. The weighting method comprises the following steps: first step, W 1 To W N Respectively and correspondingly input characteristic diagram b 1 To b N Multiplying pixel level according to the correspondence of the channel to obtain a channel weighted vector Wb 1 To Wb N The method comprises the steps of carrying out a first treatment on the surface of the And secondly, adding the weighted characteristics of all the channels and outputting a final result. This stage can be expressed by the following formula:
Figure SMS_28
Wherein,,
Figure SMS_29
representing the Hadamard product; n is the number of branches, < >>
Figure SMS_30
For the i-th branch corresponding channel weight value, < +.>
Figure SMS_31
Corresponding branch characteristics for the ith branch; />
Figure SMS_32
Is an output feature.
It will be appreciated that the process described above can also be generalized to: splitting a target picture into a plurality of branches; fusing the multi-branch characteristics; carrying out multi-parameter probability statistics and attention learning in branches on the fusion features to obtain the fusion features weighted by channel levels; carrying out space local position weighting on the fusion characteristics weighted by the channel level through a local neighborhood selection network; downsampling the fusion characteristics after double weighting is completed; carrying out feature splitting and feature reduction on the fusion features after downsampling; carrying out inter-branch attention recombination, comparison and weighting on the restored channel weight value vector to obtain channel weight value vectors corresponding to all branches after processing; and weighting and fusing the characteristics of each branch by using the channel weight value vector corresponding to each branch to obtain the output characteristics of the target picture.
Based on the above attention model, a basic network layer can be constructed, and two typical network layers are proposed in the embodiment of the present invention: a first basic block structure and a second basic block structure.
(1) First basic block structure
The first basic block structure includes: an attention model, addition module as in the above embodiments; the attention model is used for outputting the output characteristics of the target picture under the condition that the input target picture is subjected to the branch characteristics extracted under n branches, wherein n is a positive integer; and the adding module is used for adding the output characteristics of the target picture and the target picture to obtain an output result of the first basic block structure aiming at the target picture.
Exemplary, as shown in fig. 6, the target picture is checked by convolution with different sizes to perform feature weighting to obtain a plurality of branch features, the plurality of branch features are input into the attention model to obtain output features of the target picture, and then the output features of the target picture and the target picture are added to obtain a final output result.
(2) Second basic block structure
The second basic block structure includes: an attention model, addition module as in the above embodiments; the attention model is used for outputting the output characteristics of the target picture under the condition that the input target picture is subjected to the branch characteristics extracted under n branches, wherein n is a positive integer; and the adding module is used for adding the output characteristics of the target pictures and the results of the target pictures after batch standardization to obtain the output results of the second basic block structure for the target pictures.
Exemplary, as shown in fig. 7, the target picture is checked through convolution with different sizes to perform feature weighting to obtain a plurality of branch features, the plurality of branch features are input into the attention model to obtain output features of the target picture, and then the output features of the target picture and the results of the target picture after batch standardization are added to obtain a final output result.
It can be understood that, as shown in fig. 6 and 7, the attention model designed by the present invention is plug and play, i.e. corresponds to a multi-branch structure, and is directly inserted into the output position of the multi-branch, so as to finally obtain a weighted feature map.
By way of example, the basic block structure provided in the embodiment of the present invention may be used for image classification, and its specific use is shown in fig. 8, and the specific structure of the network may include: convolution- > batch normalization, rectification- > maximum pooling- > second basic block structure- > first basic block structure- > average pooling- > convolution- > second basic block structure- > first basic block structure- > global average pooling- > full connection layer- > normalization- > classification probability.
It will be appreciated that embodiments of the present invention are not limited in their particular application to basic block structures, and that fig. 8 is merely an exemplary illustration.
The embodiment of the invention also provides computer equipment with the attention model shown in the figure 4.
Referring to fig. 9, fig. 9 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 9, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 9.
The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.
Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.
The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created from the use of the computer device of the presentation of a sort of applet landing page, and the like. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.
The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.
The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims (20)

1. A picture processing method, the method comprising:
acquiring n branch characteristics extracted from a target picture under n branches, wherein n is a positive integer;
calculating branch characteristic channel weight value vectors corresponding to all branches, wherein each branch characteristic channel weight value vector is used for representing the channel importance degree corresponding to all channels corresponding to the branch characteristics;
carrying out channel-level feature weighting on the corresponding branch features by using the branch feature channel weight value vector to obtain n channel-weighted branch features;
and fusing the n channel weighted branch characteristics to obtain the output characteristics of the target picture.
2. The method of claim 1, wherein the calculating a branch characteristic channel weight value vector for each branch comprises:
performing feature compression on the n branch features to obtain compressed features;
Calculating a compression characteristic channel weight value vector based on the compression characteristic, wherein the compression characteristic channel weight value vector is used for representing the channel importance degree corresponding to each channel of the compression characteristic;
and calculating branch characteristic channel weight value vectors corresponding to all branches based on the compression characteristic channel weight value vectors.
3. The method of claim 2, wherein calculating a branch feature channel weight vector corresponding to each branch based on the compressed feature channel weight vector comprises:
based on the compression characteristic channel weight vector, carrying out channel-level characteristic weighting on the compression characteristic to obtain a weighted compression characteristic;
downsampling the weighted compression characteristics in the dimensions of height and width to obtain downsampled compression characteristics;
splitting the downsampled compressed features into n sets of split features, each set of split features corresponding to a branch;
and respectively carrying out feature reduction on the n groups of split features to obtain n branch feature channel weight value vectors corresponding to the n branches respectively.
4. The method of claim 3, wherein the performing feature reduction on the n groups of split features to obtain n branch feature channel weight value vectors corresponding to the n branches respectively includes:
Respectively inputting n groups of split features into at least one full-connection layer, and outputting to obtain n prepared branch feature channel weight value vectors;
and carrying out comparison weighting processing among branches on the n prepared branch characteristic channel weight value vectors to obtain the n branch characteristic channel weight value vectors.
5. The method according to claim 4, wherein, in the case where the number of channels of the compression feature is m, and m is a positive integer, the performing the inter-branch comparison weighting on the n preliminary branch feature channel weight value vectors to obtain the n branch feature channel weight value vectors includes:
respectively extracting channel weight values of the n preparation branch characteristic channel weight value vectors on the same channel to obtain m recombined channel weight value vectors respectively corresponding to m channels;
respectively normalizing each recombined channel weight value vector to obtain m normalized recombined channel weight value vectors;
and performing feature replacement on the n prepared branch feature channel weight value vectors by using the normalized m recombined channel weight value vectors to obtain the n branch feature channel weight value vectors.
6. A method according to claim 3, wherein the weighted compression characteristics further comprise the weighting process of:
And carrying out feature weighting on the local spatial position on the compression features subjected to the feature weighting of the channel level to obtain the weighted compression features.
7. The method of claim 6, wherein the feature weighting of the local spatial locations for the feature weighted compressed features at the channel level to obtain the weighted compressed features comprises:
calculating a local neighbor relation vector corresponding to the spatial position pixel aiming at any spatial position pixel in the compressed characteristic of the characteristic weighting of the finishing channel level, wherein the local neighbor relation vector is used for representing the correlation between the spatial position pixel and a neighborhood spatial position pixel, and the neighborhood spatial position pixel is the spatial position pixel around the spatial position pixel;
normalizing the local neighbor relation vector corresponding to each spatial position pixel to obtain a normalized local neighbor relation vector corresponding to each spatial position pixel;
and carrying out feature weighting on the corresponding spatial position pixels by using the normalized local neighbor relation vector to obtain the weighted compression features.
8. The method of claim 7, wherein prior to computing a local neighbor relation vector for any one of the spatial location pixels in the compressed features weighted for the feature at the completion channel level, the method further comprises:
Selecting a target local neighborhood relation from a plurality of local neighborhood relations for the spatial position pixels through a local neighborhood selection network, wherein the local neighborhood relation is used for defining the relation between the spatial position pixels and the neighborhood spatial position pixels;
and taking the compression characteristic meeting the selected target local neighborhood relation with the spatial position pixel as a neighborhood spatial position pixel of the spatial position pixel.
9. The method of claim 7, wherein for any one spatial location pixel in the feature weighted compressed feature of the completed channel level, calculating a local neighbor relation vector corresponding to the spatial location pixel comprises:
for a target space position pixel, respectively calculating the correlation between the target space position pixel and k neighborhood space position pixels to obtain k local neighbor relation scalar quantities of the target space position pixel, wherein k is a positive integer;
and splicing the k local neighbor relation scalar quantities of the target spatial position pixel to obtain a local neighbor relation vector corresponding to the target spatial position pixel.
10. The method of claim 7, wherein the feature weighting the corresponding spatial location pixels using the normalized local neighbor relation vector to obtain the weighted compression feature comprises:
Extracting vectors of neighborhood space position pixels of the target space position pixels in channel dimensions aiming at the target space position pixels to form local area features of the target space position pixels;
carrying out feature weighting on the local neighbor relation vector corresponding to the target spatial position pixel and the local area feature of the target spatial position pixel to obtain the local area weighted feature of the target spatial position pixel;
and splicing the local weighting characteristics of all the spatial position pixels to obtain the weighted compression characteristics.
11. The method according to claim 2, wherein, in the case where the number of channels of the compression feature is m, where m is a positive integer, the calculating a compression feature channel weight vector based on the compression feature includes:
splitting the compression characteristics at channel level to obtain m channel characteristic diagrams;
calculating statistic information of each channel feature map under various statistic values to obtain m channel statistic vectors;
and calculating the compression characteristic channel weight value vector based on the m channel statistic vectors.
12. The method of claim 11, wherein the computing the compressed feature channel weight vector based on the m channel statistic vectors comprises:
Inputting the m channel statistic vectors into at least one full-connection layer respectively, and outputting to obtain channel importance degrees corresponding to the m channels respectively;
and arranging the channel importance degrees corresponding to the m channels respectively according to the channels to obtain the compression characteristic channel weight value vector.
13. The method of claim 11, wherein the statistics comprise at least one of:
mean, variance, coefficient of variation, skewness, peak, maximum, minimum, median, and quartile.
14. The method of claim 2, wherein the process of computing the compression characteristic comprises:
and adding the n branch features to obtain fusion features corresponding to the n branch features.
15. The method of claim 2, wherein the process of computing the compression characteristic comprises:
and connecting the n branch features based on the channel dimension to obtain fusion features corresponding to the n branch features.
16. An attention model, characterized in that the attention model comprises:
the input module is used for acquiring n branch characteristics extracted from the target picture under n branches, wherein n is a positive integer;
The channel weight value calculation module is used for calculating a branch characteristic channel weight value vector corresponding to each branch, and each branch characteristic channel weight value vector is used for representing the channel importance degree corresponding to each channel of the corresponding branch characteristic;
the feature weighting module is used for carrying out feature weighting on channel levels on the corresponding branch features by using the branch feature channel weight value vector to obtain n channel weighted branch features;
and the output module is used for fusing the n channel weighted branch characteristics to obtain the output characteristics of the target picture.
17. A first basic block structure, wherein the first basic block structure comprises: the attention model, summing module of claim 16;
the attention model is used for outputting the output characteristics of the target picture under the condition that the branch characteristics of the target picture extracted under n branches are input, wherein n is a positive integer;
the adding module is configured to add the output feature of the target picture and the target picture to obtain an output result of the first basic block structure for the target picture.
18. A second basic block structure, wherein the second basic block structure comprises: the attention model, summing module of claim 16;
The attention model is used for outputting the output characteristics of the target picture under the condition that the branch characteristics of the target picture extracted under n branches are input, wherein n is a positive integer;
the adding module is configured to add the output characteristics of the target picture and the results of the target picture after batch normalization, so as to obtain an output result of the second basic block structure for the target picture.
19. A computer device, comprising:
a memory and a processor in communication with each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the picture processing method of any of claims 1 to 15.
20. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the picture processing method of any one of claims 1 to 15.
CN202310668217.9A 2023-06-07 2023-06-07 Picture processing method, system, equipment and medium Active CN116403064B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310668217.9A CN116403064B (en) 2023-06-07 2023-06-07 Picture processing method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310668217.9A CN116403064B (en) 2023-06-07 2023-06-07 Picture processing method, system, equipment and medium

Publications (2)

Publication Number Publication Date
CN116403064A true CN116403064A (en) 2023-07-07
CN116403064B CN116403064B (en) 2023-08-25

Family

ID=87016520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310668217.9A Active CN116403064B (en) 2023-06-07 2023-06-07 Picture processing method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN116403064B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117893840A (en) * 2024-03-15 2024-04-16 深圳市宗匠科技有限公司 Acne severity grading method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274999A (en) * 2020-02-17 2020-06-12 北京迈格威科技有限公司 Data processing method, image processing method, device and electronic equipment
CN113469072A (en) * 2021-07-06 2021-10-01 西安电子科技大学 Remote sensing image change detection method and system based on GSoP and twin fusion network
CN114708172A (en) * 2022-02-22 2022-07-05 北京旷视科技有限公司 Image fusion method, computer program product, storage medium, and electronic device
CN115690522A (en) * 2022-12-29 2023-02-03 湖北工业大学 Target detection method based on multi-pooling fusion channel attention and application thereof
CN116012581A (en) * 2022-12-19 2023-04-25 上海师范大学 Image segmentation method based on dual attention fusion
CN116167920A (en) * 2023-03-24 2023-05-26 浙江师范大学 Image compression and reconstruction method based on super-resolution and priori knowledge

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274999A (en) * 2020-02-17 2020-06-12 北京迈格威科技有限公司 Data processing method, image processing method, device and electronic equipment
CN113469072A (en) * 2021-07-06 2021-10-01 西安电子科技大学 Remote sensing image change detection method and system based on GSoP and twin fusion network
CN114708172A (en) * 2022-02-22 2022-07-05 北京旷视科技有限公司 Image fusion method, computer program product, storage medium, and electronic device
CN116012581A (en) * 2022-12-19 2023-04-25 上海师范大学 Image segmentation method based on dual attention fusion
CN115690522A (en) * 2022-12-29 2023-02-03 湖北工业大学 Target detection method based on multi-pooling fusion channel attention and application thereof
CN116167920A (en) * 2023-03-24 2023-05-26 浙江师范大学 Image compression and reconstruction method based on super-resolution and priori knowledge

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117893840A (en) * 2024-03-15 2024-04-16 深圳市宗匠科技有限公司 Acne severity grading method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116403064B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
WO2021043168A1 (en) Person re-identification network training method and person re-identification method and apparatus
CN110188795B (en) Image classification method, data processing method and device
CN112308200B (en) Searching method and device for neural network
CN112288011B (en) Image matching method based on self-attention deep neural network
CN110059728B (en) RGB-D image visual saliency detection method based on attention model
CN109711416B (en) Target identification method and device, computer equipment and storage medium
CN112232300B (en) Global occlusion self-adaptive pedestrian training/identifying method, system, equipment and medium
CN112598597A (en) Training method of noise reduction model and related device
CN113191489B (en) Training method of binary neural network model, image processing method and device
Wadhwa et al. Hyperrealistic image inpainting with hypergraphs
CN113066065B (en) No-reference image quality detection method, system, terminal and medium
CN113781510B (en) Edge detection method and device and electronic equipment
CN110222718A (en) The method and device of image procossing
CN110532959B (en) Real-time violent behavior detection system based on two-channel three-dimensional convolutional neural network
CN113011253B (en) Facial expression recognition method, device, equipment and storage medium based on ResNeXt network
CN116403064B (en) Picture processing method, system, equipment and medium
CN112819007B (en) Image recognition method, device, electronic equipment and storage medium
CN111179270A (en) Image co-segmentation method and device based on attention mechanism
CN115147648A (en) Tea shoot identification method based on improved YOLOv5 target detection
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
CN113901928A (en) Target detection method based on dynamic super-resolution, and power transmission line component detection method and system
CN111079930A (en) Method and device for determining quality parameters of data set and electronic equipment
CN114049491A (en) Fingerprint segmentation model training method, fingerprint segmentation device, fingerprint segmentation equipment and fingerprint segmentation medium
CN116071625B (en) Training method of deep learning model, target detection method and device
CN111860368A (en) Pedestrian re-identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant