CN111428807A

CN111428807A - Image processing method and computer-readable storage medium

Info

Publication number: CN111428807A
Application number: CN202010261102.4A
Authority: CN
Inventors: 纪元法; 黄铭洁; 孙希延; 陈小毛; 蓝如师
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2020-04-03
Filing date: 2020-04-03
Publication date: 2020-07-17

Abstract

The embodiment of the application discloses an image processing method and a computer readable storage medium, wherein the image to be processed can be obtained, and the image to be processed is subjected to feature extraction to obtain a feature map; obtaining a coarse classification result corresponding to the image to be processed according to the feature map; acquiring an attention feature map with a plurality of specific attentions according to the feature map; performing data enhancement operation on the attention feature map to obtain a data enhancement feature map; acquiring a fine classification result corresponding to the image to be processed according to the data enhancement feature map; and determining a classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result. According to the scheme, a fine classification result is obtained by a data enhancement feature map obtained by performing data enhancement operation on the attention feature map, and a final classification result of the image is determined by combining a rough classification result and the fine classification result, so that the accuracy of image classification is improved.

Description

Image processing method and computer-readable storage medium

Technical Field

The present application relates to the field of computer vision technologies, and in particular, to an image processing method and a computer-readable storage medium.

Background

As the images used as information carriers become more and more abundant, the images can be classified according to actual requirements to determine the categories to which the images belong, for example, the images can be recognized with a large difference, such as the classification of different categories such as people, cars, or dogs; or identifying the subclass category in the large category of the image, such as identifying different birds or identifying different vehicles. Therefore, when the image is finely sub-classified, the image is more focused on tiny and important local features in the image, so that the fine-grained classification difficulty of the image is increased. In the existing image classification method, the precision of the classification result of the fine-grained image is low aiming at the problems that the fine inter-class difference between sub-classes, the large intra-class difference, the dependence on a large amount of manual labeling information, the loss of key features caused by overfitting, the lack of a data set sample, the interference of background noise on the classification of the weakly supervised learning image and the like are caused.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device and a computer readable storage medium, which can improve the accuracy of image classification.

In a first aspect, an embodiment of the present application provides an image processing method, including:

acquiring an image to be processed, and performing feature extraction on the image to be processed to obtain a feature map;

obtaining a coarse classification result corresponding to the image to be processed according to the feature map;

acquiring an attention feature map with a plurality of specific attentions according to the feature map;

performing data enhancement operation on the attention feature map to obtain a data enhancement feature map;

acquiring a fine classification result corresponding to the image to be processed according to the data enhancement feature map;

and determining a classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result.

In a second aspect, an embodiment of the present application further provides an image processing apparatus, including a memory and a processor, where the memory stores a computer program, and the processor executes any one of the image processing methods provided by the embodiments of the present application when calling the computer program in the memory.

In a third aspect, an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, and the computer program is loaded by a processor to execute any one of the image processing methods provided in the embodiment of the present application.

The method and the device can acquire the image to be processed, and perform feature extraction on the image to be processed to obtain a feature map; then, obtaining a rough classification result corresponding to the image to be processed according to the feature map, obtaining an attention feature map with a plurality of specific attentions according to the feature map, performing data enhancement operation on the attention feature map to obtain a data enhancement feature map, and obtaining a fine classification result corresponding to the image to be processed according to the data enhancement feature map; at this time, the classification result corresponding to the image to be processed can be determined based on the coarse classification result and the fine classification result. According to the scheme, a fine classification result is obtained by a data enhancement feature map obtained by performing data enhancement operation on the attention feature map, and a final classification result of the image is determined by combining a rough classification result and the fine classification result, so that the accuracy of image classification is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of an image processing method provided in an embodiment of the present application;

fig. 2 is another schematic flow chart of an image processing method provided in an embodiment of the present application;

fig. 3 is another schematic flow chart of an image processing method provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of feature matrix generation by image fusion and stitching based on BAP according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating an attention area cropping and enlarging operation performed on an image according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of an attention area dropping operation performed on an image according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Embodiments of the present application provide an image processing method and a computer-readable storage medium. The image processing method can be applied to image processing equipment, and the image processing equipment can comprise a server, a terminal and the like, wherein the terminal can comprise a mobile phone, a computer, a camera and the like.

Referring to fig. 1, fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the present application. The image processing method may include steps S101 to S106, and the like, and specifically may be as follows:

s101, obtaining an image to be processed, and performing feature extraction on the image to be processed to obtain a feature map.

The type and the obtaining mode of the image to be processed can be flexibly set according to actual needs, for example, the image to be processed can contain objects such as people, tables, flowers, trees, birds, dogs or vehicles, the image to be processed can be obtained from an image storage database preset by a server, or the image to be processed can be obtained from a terminal local storage, or the image to be processed is acquired through a camera, and the image acquired by the camera is used as the image to be processed; and so on.

After obtaining the image to be processed, feature extraction may be performed on the image to be processed through a Residual Network (ResNet), a Histogram of Oriented Gradients (HOG), a Convolutional Neural Network (CNN), or the like, so as to obtain a feature map, where the feature map may be a feature map corresponding to a global feature in the image, for example, a feature of an object in the image to be processed may be extracted, so as to obtain a feature map including features of the object, and the object may be flexibly set according to actual needs, for example, the object may be an object such as a person, a table, a flower, a tree, a bird, a dog, or a vehicle.

It should be noted that, in order to improve the accuracy of feature extraction, the image to be processed may be preprocessed, where the preprocessing may include filtering, denoising, or scaling, so as to reduce interference or enhance the definition of the image, and then the preprocessed image is subjected to feature extraction to obtain a feature map.

In some embodiments, the extracting the feature of the image to be processed to obtain the feature map includes: and performing feature extraction on the image to be processed through a preset residual error network to obtain a feature map.

To improve the accuracy of feature extraction, one canThe method comprises the steps of extracting features of an image to be processed through a preset residual error network, wherein the preset residual error network is a trained residual error network, specifically, training sample images of various types can be obtained, the residual error network is trained through the training sample images to obtain the trained residual error network, and the trained residual error network can accurately extract the features of any input image^H×W×NThe specific values of H, W and N can be flexibly set according to actual needs, and are not limited herein.

And S102, acquiring a coarse classification result corresponding to the image to be processed according to the feature map.

After the feature map is obtained, operations such as Bilinear Attention Pooling (BAP) and attention normalization constraint may be further performed to obtain a coarse classification result corresponding to the image to be processed. The rough classification result may be a probability of identifying a small category to which the target object belongs from the data set to be processed (i.e., the image to be processed), such as a probability of identifying a model of the vehicle from the data set of the vehicle.

In some embodiments, obtaining a coarse classification result corresponding to the image to be processed according to the feature map includes: performing convolution operation on the feature map to generate a first attention map; performing fusion operation on the feature map and the first attention map to obtain a first partial feature map; and acquiring a coarse classification result corresponding to the image to be processed according to the first part of feature mapping image.

Specifically, as shown in fig. 2 and 3, a feature map F ∈ R is generated in the image input ResNet network^H×W×NThereafter, the feature map may be subjected to a convolution operation, for example, the feature map F may be subjected to a convolution operation of 1 × 1 (i.e., 1 × 1) to generate the first attention map a₁(i.e., A1), the first attention map A₁Can refer to a feature map containing attention features, wherein A₁∈R^H ^×W×MFeature map F and first attention map A₁All of which are H × W, feature map F has N channels, first attention map A₁There are M channels. Then, the feature map F and the first attention map A are compared₁Performing a fusion operation, wherein the specific fusion manner is not limited, for example, as shown in fig. 4, the feature map F and the first attention map a may be combined₁As the input of BAP, element dot product operation is carried out to obtain the first part feature mapping chart F_1kThe first partial feature map F_1kMay include a plurality of, the first partial feature maps F_1kIt may refer to an image obtained by fusing the attention feature with the global feature (i.e. feature map) to obtain more levels of local features, where the first partial feature map F_1kThe characterization capability of the image can be improved. Wherein, the structure of BAP can be as shown in FIG. 4, taking the attention map A in FIG. 4 as the first attention map A₁And using the partial feature map FK as the first partial feature map F_1kAnd using the feature matrix S as the first feature matrix S₁(i.e., S1) for understanding. At this time, the feature map F can be obtained according to the first part_1kObtaining a coarse classification result corresponding to the image to be processed, performing convolution operation, fusion operation and the like on the feature map, and mapping F based on the first part of feature map_1kObtaining a coarse classification result P₁(i.e., P1) the reliability of the coarse classification result acquisition can be improved.

In some embodiments, the fusing the feature map and the first attention map to obtain the first partial feature map includes: and performing dot multiplication operation on the feature map and the plurality of channel feature maps of the first attention map according to elements to obtain a plurality of first partial feature maps.

Wherein the first attention map A₁Represents a certain part of a particular object, i.e. an object, and A₁＝{a₁₁,a₁₂,...,a_1K,...,a_1MF and a first attention map a₁A plurality of channel profiles of_1kPerforming dot multiplication operation according to elements to obtain a plurality of first part feature maps F_1kI.e. first note of the feature map F and each channelIntention map A₁Multiplying according to element correspondence to obtain M first part feature mapping images F_1kWherein a is_kThe kth part of the target object in the image can be reflected, so that the fusion effect is improved.

In some embodiments, obtaining a coarse classification result corresponding to the image to be processed according to the first partial feature map includes: carrying out global average pooling operation on the first part of feature maps to generate a first part of feature map tensor after dimension reduction; carrying out attention normalization constraint on a preset part in the first part of the feature mapping tensor map after dimension reduction to obtain a normalized feature mapping map tensor; vector splicing is carried out on the normalized feature mapping tensor to generate a first feature matrix; and classifying the image to be processed according to the first feature matrix to obtain a coarse classification result.

In order to improve the accuracy of the coarse classification result acquisition, the first partial feature map F may be used_1kPerforming a global average pooling operation such that each first partial feature map F_1kFinally, dimension reduction is changed into one-dimensional tensor, and the tensor f of the first part of feature mapping image after dimension reduction is obtained_1kI.e. generating each first partial feature map F_1kAnd the corresponding one-dimensional tensor after dimension reduction. Then, the dimensionality-reduced first partial feature map tensor f_1kThe preset part is subjected to attention normalization constraint to punish differences between different features of the same object, for example, the feature representing the kth part is subjected to attention normalization constraint to obtain a normalized feature map tensor, so that each part of the feature map in the normalized feature map tensor is close to the feature center of the part, and the normalized feature map tensor may include multiple parts. At this time, all normalized eigenmap tensors may be vector-stitched to generate the first eigen matrix S₁The first feature matrix S₁May be an N × M feature matrix, the first feature matrix S₁All partial features are included. At this time according to the first feature matrix S₁Classifying the image to be processed to obtain a coarse classification result, for example, the first feature matrix S may be used₁Input supportClassifying by a vector machine (SVM) classifier or a softmax classifier to obtain a rough classification result P corresponding to the image to be processed₁。

Specifically, a feature map F is generated after inputting an image into a ResNet network, and the feature map F is subjected to a1 × 1 convolution operation to generate a first attention map A₁Then, first, the feature map F and the first attention map a are combined₁Performing dot product operation, wherein the specific formula is as follows:

wherein the content of the first and second substances,

multiplication operation according to element correspondence, F represents a characteristic diagram, a_1kShowing a first attention map A₁Attention map of the k-th channel, F_1kIs a first partial feature map.

Then, the local features with the recognition power are further extracted through a local feature extraction function g (-) by the following specific formula:

f_1k＝g(F_1k) (2)

where g (-) is the global average pooling function, f_1k∈R^1×NIs the kth partial feature tensor.

From equation (2), each first partial feature map F_1kFinally reducing the dimension to one number, and then the first part of feature map F of each group_1kThe dimensionality is reduced to a set of one-dimensional tensors, namely the first part of the eigenmap tensor f after the dimensionality reduction_1kThen, the dimensionality-reduced first partial feature map tensor f_1kNamely, the features representing the kth position are subjected to attention normalization constraint to obtain a normalized feature mapping tensor, and the normalized feature mapping tensor is subjected to vector splicing to form a first feature matrix S₁∈R^M×NThe first feature matrix S₁And inputting the softmax classifier for classification. In summary, S₁Can be represented by the following formula:

from the above, the feature map F and the first attention map A₁After BAP operation, a first feature matrix S is generated₁In order to make the features of the same part of the same object as similar as possible and to penalize the difference between different features of the same object, the class center loss can be used to supervise the learning process of attention, for example, the attention center loss function is used to map the tensor f of the feature map of the first part after dimension reduction_1kAn attention normalization constraint is performed, wherein the attention center loss function is defined as follows:

wherein, L_ARepresenting a loss of centre of attention, f_1kRepresenting the feature of the k-th site in c_kThe feature center of the kth part of the representative part is initialized and defined as 0, and then the value of the feature center is updated according to a moving average formula, wherein the specific formula is as follows:

c_k←c_k+β(f_1k-c_k) (5)

wherein β controls the global feature center c of the site_kThe learning rate of (1) ensures that each part of the eigen map in the normalized eigen map tensor is close to the eigen center of the part of the eigen map tensor through the attention regularization loss function, namely, each normalized eigen map tensor represents a unique object part, and finally, an eigen matrix S formed by splicing all the normalized eigen map tensors is spliced₁Inputting the classifier softmax for classification to obtain a coarse classification result P₁。

And S103, acquiring an attention feature map with a plurality of specific attentions according to the feature map.

It should be noted that, the execution sequence between step S102 and step S103 may be that step S102 is executed first, and then step S103 is executed; or, step S103 is executed first, and then step S102 is executed; alternatively, step S102 and step S103 are executed simultaneously, and the specific execution order between step S102 and step S103 is not limited herein in this embodiment.

For example, as shown in fig. 2 and 3, after obtaining the feature map F, an attention feature map Aq (i.e., a) having a plurality of specific attentions may be obtained based on the feature map F_q) The attention feature map Aq may be a feature map including a plurality of attention regions. In some embodiments, obtaining an attention profile having a plurality of specific attentions from the profiles comprises: and carrying out extrusion and excitation operations on the feature map for multiple times to generate an attention feature map with multiple specific attentions.

Specifically, the attention feature map may be generated using a multi-excitation pattern, where the extracted feature map F ═ F₁,F₂,...,F_k,...,F_N]∈R^W×H×NThen, the F is subjected to One-time extrusion Multi-Excitation (OSME) operation to generate a plurality of attention-specific feature maps A_qThe useful information with the authentication portion is made more prominent as much as possible.

Feature map F is first aggregated using a global average pooling squeeze operation with the feature map of spatial dimension W × H to generate a channel-level descriptor Z ═ Z₁,z₂,...,z_k,...,z_N]The concrete formula is as follows:

wherein, F_k(W, H) represents the element value in the spatial dimension W × H.

Then, an independent door mechanism is applied to each excitation module in Z, and the number of excitation modules is set to q as 1,2, …, q, and the specific formula is as follows:

wherein, sigma represents Sigmoid function, represents Re L u function,

are weight coefficients.

At this time, the specific attention feature map A_qThe channel generation through the re-weighting original feature map F is as follows:

the OSME is an attention method for component positioning under weak supervision, and has the function of generating a plurality of feature maps with specific attention, and the weights of the feature maps can be learned through a network according to a loss function, so that the effective feature map has large weight, and the invalid or small-effect feature map has small weight. In other words, the network structure is an attention mechanism in the channel dimension, and different from the adoption of a multi-layer Excitation structure, a plurality of attention structures are generated, namely, a plurality of attention areas are extracted to be transmitted to the later stage for analysis. Specifically, the importance degree of each feature channel is automatically acquired through a learning mode, and then useful features are promoted according to the importance degree and the features which are not useful for the current task are suppressed.

And S104, performing data enhancement operation on the attention feature map to obtain a data enhancement feature map.

Wherein, the data enhancement is an operation of increasing the data training data volume and is used for preventing overfitting and improving the performance of the deep learning network. The data enhancement method can include augmentation by a random method, such as random image clipping, the target image can be randomly clipped during image processing, so that a required target can be clipped with a certain probability, in order to solve the problem that the data enhancement is affected by noise such as background and the like, and the unnecessary target is clipped, the embodiment of the application can generate an attention map used for representing the salient features of the target through weak supervised learning in the training process of a network, and then enhance the data by using guidance data with the target of the attention map, wherein the guidance data comprises the attention clipping, the attention discarding and the like.

In the generation of an attention feature map A containing a plurality of specific attentions_qThen, in order to obtain more fine-grained information with discriminative power, a data enhancement mode can be adopted to extract local fine-grained features so as to improve classification features, and at this time, the attention feature map a can be subjected to_qAnd carrying out data enhancement operation to obtain a data enhancement characteristic diagram. The data enhancement operation can improve the significance of key features, reduce the influence of unnecessary features, and further improve the performance.

In some embodiments, performing a data enhancement operation on the attention feature map to obtain a data enhancement feature map includes: selecting a preset channel of the attention feature map to carry out normalization operation to obtain a candidate data enhancement feature map; performing attention area cutting and amplifying operation on the candidate data enhancement feature map to obtain an image obtained after the attention area cutting and amplifying; performing attention area descending operation on the candidate data enhancement feature map to obtain an image with the descending attention area; and (3) up-sampling and amplifying the image with the clipped and amplified attention area and the image with the decreased attention area to the size of a data set picture (namely the size of the image to be processed), and inputting the data into a residual error network to extract features to generate a data enhancement feature map.

Wherein, a preset channel of the attention feature map can be randomly selected to guide the data enhancement process, and the normalization operation is carried out to obtain a candidate data enhancement feature map

Enhancing feature maps using candidate data

Guiding data enhancement, enhancing the candidate data into a feature map

Adding a cropping mask, and enlarging the part with the judging feature to the size of the image data set after cropping, namely cutting and enlarging the kth image data set through the attention areaThe part is enlarged to the same size as the original image (namely the image to be processed), thereby realizing the characteristic map enhancement of the candidate data

And performing attention area cutting and amplifying operation to obtain an image obtained by cutting and amplifying the attention area.

And enhancing the candidate data with the feature map

Adding a fall mask of the attention area, deleting the part cut and amplified, namely cutting the kth part of the original picture through the fall of the attention area, and realizing the enhancement of the feature map of the candidate data

And performing attention area descending operation to obtain the image with the reduced attention area, so that the network can be encouraged to extract other identification parts based on data enhancement operation, and the robustness of image classification and the positioning accuracy can be improved.

Specifically, attention feature map a may be used because it is inefficient to randomly select a portion of an image for enhancement, and particularly when the size of an image slice is small, background noise may be introduced to cause interference_qTo better filter background noise. The embodiment of the application adopts the randomly selected attention feature map A_qOf one of the channels of (1) an attention map A_KTo guide the data enhancement process and normalize it, and let the k-th candidate data enhancement feature map be

The specific formula of the normalization process is as follows:

obtaining candidate data enhanced feature map

Then, the interest area can be selected, the area is enlarged, and more detailed local features are extracted. Wherein the idea of cutting the mask is to select a threshold value theta_c，

Pixel value

Greater than theta_cIs set to a value of 1, less than theta_cAnd setting 0. Thus, the region set to 1 is a partial region that needs to be focused. The specific formula for obtaining the cropping mask is as follows:

the candidate data is enhanced with a feature map of

After the mask cutting operation, a local area is obtained through cutting, the local area is up-sampled and amplified to the size of an original image, namely, the local area is amplified, a more detailed part is extracted, the more detailed part is input into a residual error network ResNet as an enhanced data set to extract more detailed features again, the specific process is shown in FIG. 5, after the image is subjected to feature extraction through a ResNet network, an attention feature map is obtained, then the attention feature map is subjected to cutting and amplifying operations to obtain a local enlarged image, and the feature extraction is performed on the local enlarged image through the ResNet network to obtain a local feature map, wherein the local feature map is the feature map extracted from the image after the attention area is cut and amplified.

Attention regularization loss supervision Each attention map A_kRepresenting features of the same kth segment, but different attention diagrams A_kSimilar parts may be of interest in order to alleviate multiple attention diagrams A_kFocusing on the problem of the same part of an object, the embodiment of the application adopts the candidate data to enhance the feature map

Performing attention-fall operations to encourage the model to extract features from the plurality of discriminatory portions, the attention-fall mask being obtained as opposed to the attention-fall mask being clipped, e.g. by selecting a threshold θ_d，

Pixel value

Greater than theta_dIs set to 0 and is smaller than theta _d1, placing. The specific formula is as follows:

therefore, the operation deletes the region with the attention clipping from the original image, and inputs the rest of the image as a data set into the residual error network ResNet to extract other features, the specific process is as shown in FIG. 6, and after the image is subjected to feature extraction through the ResNet network in FIG. 6, an attention feature map is obtained, then an attention mask operation is performed on the attention feature map, namely, an attention region reduction operation is performed to obtain a local deletion map, and the local deletion map is subjected to feature extraction through the ResNet network to obtain an image extracted feature map with the attention region reduced.

And S105, acquiring a fine classification result corresponding to the image to be processed according to the data enhancement feature map.

The fine classification result may be a probability of identifying a subclass to which the target object belongs from the image to be processed. After the data enhancement feature map is obtained, BAP operation can be performed on the data enhancement feature map, and a fine classification result corresponding to the image to be processed is obtained.

In some embodiments, obtaining a fine classification result corresponding to the image to be processed according to the data enhancement feature map includes: acquiring a feature mapping chart corresponding to the data enhancement feature chart; performing convolution operation on the feature map to generate a second attention map; performing fusion operation on the feature mapping chart and the second attention mapping chart to obtain a second partial feature mapping chart; and acquiring a fine classification result corresponding to the image to be processed according to the second part of feature mapping chart.

Specifically, as shown in fig. 2 and 3, the feature map obtained by extracting features from the image with the enlarged attention region and the image with the reduced attention region after the data enhancement operation is the data enhancement feature map, at this time, the data enhancement feature map may be input into the residual error network ResNet as a data set to extract features of a deeper layer, so as to obtain the feature map T. The feature map T may then be convolved, e.g., the feature map T may be subjected to a1 x 1 convolution operation to generate a second attention map a having a plurality of portions₂The second attention map A₁May refer to a feature map, A, containing attention features after a data enhancement operation₂∈R^H×W×MA feature map T and a second attention map A₂As input to BAP, to perform operations such as element dot multiplication, pooling, and vector stitching.

Next, the feature map T and the second attention map A are mapped₂The fusion operation is performed to fuse the key features as much as possible, and the specific fusion manner is not limited, for example, as shown in fig. 4, the feature map T and the second attention map a may be combined₂As the input of BAP, performing dot product operation to obtain a second partial feature map F_2kThe second partial feature map F_2kThe number of the feature maps can be multiple, and the second partial feature map can refer to an image obtained by fusing the attention feature subjected to data enhancement with the global feature (namely, the data enhancement feature map) of the attention feature to obtain a more hierarchical local feature. Wherein A is₂＝{a₂₁,a₂₂,...,a_2K,...,a_2MThe feature map T and the second attention map A may be mapped₂A plurality of channel profiles of_2kPerforming dot multiplication operation according to elements to obtain a plurality of second partial feature maps F_2kSo that the second attention map A can be mapped₂Fused with the feature map T to extract more fine particlesSecond partial feature map F of degree features_2k。

Wherein, the structure of BAP can be as shown in FIG. 4, taking the attention map A in FIG. 4 as the second attention map A₂And using the partial feature map FK as a second partial feature map F_2kAnd using the feature matrix S as a second feature matrix S₂(i.e., S2) for understanding. At this time, the feature map F can be obtained according to the second part_2kObtaining a fine classification result corresponding to the image to be processed, performing convolution operation, fusion operation and the like on the feature map T, and based on the second part of feature map F_2kObtaining a fine classification result P₂(i.e., P2) the accuracy and reliability of the fine classification result acquisition can be improved.

In some embodiments, obtaining a fine classification result corresponding to the image to be processed according to the second partial feature map includes: carrying out global average pooling operation on the second part of feature mapping graph to obtain a second part of feature mapping graph tensor after dimension reduction; performing vector splicing on the dimensionality-reduced tensor of the second part of the feature mapping graph to generate a second feature matrix; and classifying the image to be processed according to the second feature matrix to obtain a fine classification result.

In order to improve the accuracy of the fine classification result acquisition, the second partial feature map F may be used_2kPerforming global average pooling operation to make each second partial feature map F_2kFinally, dimension reduction is changed into one-dimensional tensor, and the tensor f of the second part of the feature mapping image after dimension reduction is obtained_2kI.e. generating each second partial feature map F_2kA corresponding reduced-dimension one-dimensional tensor, the reduced-dimension second partial eigenmap tensor f_2kMay comprise a plurality of sheets. In this case, all the reduced second partial eigenmap tensors f may be used_2kPerforming vector splicing to generate a second feature matrix S₂The second feature matrix S₂May be an N × M feature matrix and then based on a second feature matrix S₂Classifying the image to be processed to obtain a coarse classification result, for example, the second feature matrix S may be used₂Input SVM classifier or softmax classifier, etcClassifying to obtain a fine classification result P corresponding to the image to be processed₂。

Specifically, a feature map T and a second attention map A are obtained₂Then, first, the feature map T and the second attention map A are mapped₂Performing dot product operation, wherein the specific formula is as follows:

wherein the content of the first and second substances,

multiplication operation by element correspondence, T denotes a feature map, a_2kShowing a second attention map A₂Attention map of the k-th channel, F_2kIs a second partial feature map.

f_2k＝g(F_2k) (13)

where g (-) is the global average pooling function, f_2k∈R^1×NIs the kth partial eigenmap tensor.

From equation (2), each second partial feature map F_2kFinally reducing the dimension to one number, and then the second part of feature mapping chart F of each group_2kReducing the dimensionality into a group of one-dimensional tensors, namely the tensor f of the feature mapping image of the second part after the dimensionality reduction_2kThen, the dimensionality-reduced second partial feature map tensor f_2kVector splicing is carried out, and a second feature matrix S can be formed₂∈R^M×NSecond feature matrix S₂And inputting the softmax classifier for classification. In summary, S₂Can be represented by the following formula:

from the above, the feature map F and the secondAttention map A₂After BAP operation, a second feature matrix S is generated₂. Finally, the second feature matrix S₂Inputting the classifier softmax for classification to obtain a fine classification result P₂。

In the embodiment, the attention mechanism-based OSME is adopted and the data enhancement operation is combined, so that more subtle distinguishable images are added to the data set, and the BAP is utilized to fuse more layers of features together, so that the network can focus on the part of important features in the images under the condition of not additionally marking information, the identification accuracy is greatly improved, the accuracy of obtaining a fine classification result is improved, and the classification precision is effectively improved. The method specifically comprises a ResNet extraction feature network, a BAP, attention normalization constraint, data enhancement operation and the like, wherein ResNet is used as an extraction feature network structure of an algorithm, an attention mapping map is generated by the extracted feature map, BAP is used for taking the feature map and the attention mapping map as input, point multiplication, pooling, vector splicing and other operations are carried out, and features of different layers are obtained to enhance local features; attention normalization constraint is to monitor the learning process of attention by adopting a class center loss function algorithm, so that the characteristics of the same part on the same object are similar as much as possible; the data enhancement operation is guided by an attention mechanism, the attention area is cut and amplified, and the attention area is reduced, so that the model focuses more on fine-grained characteristics of the image, the interference of background noise is reduced, and the identification precision is further improved.

The attention mechanism aims to enable the network to pay more attention to relevant parts in input, and is different from the traditional attention mechanism SEnet. The data enhancement operation overcomes the defects that the classification training data of the weakly supervised fine grained image is limited and the precision can be improved only by professional knowledge and a large amount of time marking, the background noise irrelevant to the classification and identification of the image is abandoned by cutting and amplifying the attention area so as to enhance the appearance of local features, and the features are extracted from a plurality of judgment parts by an excitation model by descending the attention area so as to further improve the precision. The BAP algorithm fuses the attention features and the global features to obtain more levels of features, pertinently enhances the image, enhances the significance of local features having discrimination function on fine-grained classification tasks, and improves the classification precision. Attention normalization constraint punishs the difference between different features of the same object, so that the features of the same part on the same species are similar as much as possible, and the classification precision of fine-grained images can be obviously improved.

And S106, determining a classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result.

In some embodiments, determining the classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result includes: and adding the coarse classification result and the fine classification result to obtain a classification result corresponding to the image to be processed.

After obtaining the coarse classification result and the fine classification result, in order to improve the convenience of determining the classification result, the coarse classification result P may be used₁And a fine classification result P₂Adding the obtained results to obtain a classification result P which corresponds to the image to be processed₁+P₂. The classification result may be a classification result corresponding to an object in the image to be processed, and the classification result corresponding to the object may be a sub-category corresponding to a highest probability of the sub-category to which the object belongs, as the category of the object.

In some embodiments, determining the classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result includes: setting a first weight value corresponding to the coarse classification result and a second weight value corresponding to the fine classification result; and determining a classification result corresponding to the image to be processed according to the rough classification result, the first weight value, the fine classification result and the second weight value.

To improve the flexibility and reliability of classification result determination, a coarse classification result P may be set₁To a corresponding secondA weighted value C₁And a fine classification result P₂Corresponding second weight value C₂Then the coarse classification result P₁And a first weight value C₁Multiplying to obtain a first value P_1*C₁And classifying the result P of the subdivision₂And a second weight value C₂Multiplying to obtain a second value P_2*C₂At this time, the first value P may be set_1*C₁And a second value P_2*C₂Taking the sum as a classification result P ═ P corresponding to the image to be processed_1*C₁+P_2*C₂。

In order to better implement the image processing method provided by the embodiment of the present application, the embodiment of the present application further provides an apparatus based on the image processing method. The terms are the same as those in the image processing method, and details of implementation can be referred to the description in the method embodiment.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure, wherein the image processing apparatus 300 may include an extraction module 301, a first obtaining module 302, a second obtaining module 303, a data enhancement module 304, a third obtaining module 305, a determination module 306, and the like.

The extraction module 301 is configured to acquire an image to be processed, and perform feature extraction on the image to be processed to obtain a feature map.

The first obtaining module 302 is configured to obtain a coarse classification result corresponding to the image to be processed according to the feature map.

A second obtaining module 303, configured to obtain an attention feature map with a plurality of specific attentions according to the feature map.

And the data enhancement module 304 is configured to perform data enhancement operation on the attention feature map to obtain a data enhancement feature map.

And a third obtaining module 305, configured to obtain a fine classification result corresponding to the image to be processed according to the data enhancement feature map.

And the determining module 306 is configured to determine a classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result.

Optionally, the first obtaining module 302 includes:

the first convolution submodule is used for performing convolution operation on the feature map to generate a first attention mapping map;

the first fusion submodule is used for carrying out fusion operation on the feature map and the first attention map to obtain a first part of feature map;

and the first obtaining submodule is used for obtaining a coarse classification result corresponding to the image to be processed according to the first partial feature mapping image.

Optionally, the first obtaining sub-module is specifically configured to: carrying out global average pooling operation on the first part of feature maps to generate a first part of feature map tensor after dimension reduction; performing attention normalization constraint on the features of the preset part in the first part of the feature mapping tensor after dimension reduction to obtain a normalized feature mapping tensor; vector splicing is carried out on the normalized feature mapping tensor to generate a first feature matrix; and classifying the image to be processed according to the first feature matrix to obtain a coarse classification result.

Optionally, the first fusion submodule is specifically configured to: and performing dot multiplication operation on the feature map and the plurality of channel feature maps of the first attention map according to elements to obtain a plurality of first partial feature maps.

Optionally, the data enhancement module 304 is specifically configured to: selecting a preset channel of the attention feature map to carry out normalization operation to obtain a candidate data enhancement feature map; performing attention area cutting and amplifying operation on the candidate data enhancement feature map to obtain an image obtained after the attention area cutting and amplifying; performing attention area descending operation on the candidate data enhancement feature map to obtain an image with the descending attention area; and inputting the image with the enlarged attention area clipping and the image with the reduced attention area into a residual error network to extract features, and generating a data enhancement feature map.

Optionally, the third obtaining module 305 includes:

the second obtaining submodule is used for obtaining a feature mapping chart corresponding to the data enhancement feature chart;

the second convolution submodule is used for carrying out convolution operation on the feature mapping chart to generate a second attention mapping chart;

the second fusion submodule is used for carrying out fusion operation on the feature mapping chart and the second attention mapping chart to obtain a second part of feature mapping chart;

and the third obtaining submodule is used for obtaining a fine classification result corresponding to the image to be processed according to the second partial feature mapping image.

Optionally, the third obtaining sub-module is specifically configured to: carrying out global average pooling operation on the second part of feature mapping graph to obtain a second part of feature mapping graph tensor after dimension reduction; performing vector splicing on the dimensionality-reduced tensor of the second part of the feature mapping graph to generate a second feature matrix; and classifying the image to be processed according to the second feature matrix to obtain a fine classification result.

Optionally, the extracting module 301 is specifically configured to: and performing feature extraction on the image to be processed through a preset residual error network to obtain a feature map.

Optionally, the second obtaining module 303 is specifically configured to: and carrying out extrusion and excitation operations on the feature map for multiple times to generate an attention feature map with multiple specific attentions.

Optionally, the determining module 306 is specifically configured to: adding the coarse classification result and the fine classification result to obtain a classification result corresponding to the image to be processed; or setting a first weight value corresponding to the rough classification result and a second weight value corresponding to the fine classification result, and determining the classification result corresponding to the image to be processed according to the rough classification result, the first weight value, the fine classification result and the second weight value.

In the embodiment of the application, the extraction module 301 may acquire an image to be processed, and perform feature extraction on the image to be processed to obtain a feature map; then, a first obtaining module 302 obtains a rough classification result corresponding to the image to be processed according to the feature map, a second obtaining module 303 obtains an attention feature map with a plurality of specific attentions according to the feature map, a data enhancement module 304 performs data enhancement operation on the attention feature map to obtain a data enhancement feature map, and a third obtaining module 305 obtains a fine classification result corresponding to the image to be processed according to the data enhancement feature map; at this time, the determination module 306 may determine a classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result. According to the scheme, a fine classification result is obtained by a data enhancement feature map obtained by performing data enhancement operation on the attention feature map, and a final classification result of the image is determined by combining a rough classification result and the fine classification result, so that the accuracy of image classification is improved.

Referring to fig. 8, fig. 8 is a schematic block diagram of a structure of an image processing apparatus according to an embodiment of the present application.

As shown in fig. 8, the image processing apparatus 400 may include a processor 402, a memory 403, and a communication interface 404 connected by a system bus 401, wherein the memory 403 may include a nonvolatile computer-readable storage medium and an internal memory.

The non-transitory computer readable storage medium may store a computer program. The computer program comprises program instructions which, when executed, cause a processor to perform any of the image processing methods.

The processor 402 is used to provide computational and control capabilities to support the operation of the overall image processing apparatus.

The memory 403 provides an environment for the execution of a computer program in a non-transitory computer readable storage medium, which when executed by the processor 402, causes the processor 402 to perform any of the image processing methods.

The communication interface 404 is used for communication. Those skilled in the art will appreciate that the structure shown in fig. 8 is a block diagram of only a part of the structure related to the present application, and does not constitute a limitation to the image processing apparatus 400 to which the present application is applied, and a specific image processing apparatus 400 may include more or less components than those shown in the figure, or combine some components, or have a different arrangement of components.

It should be understood that the bus 401 is, for example, an I2C (Inter-Integrated Circuit) bus, the Memory 403 may be a Flash chip, a Read-Only Memory (ROM), a magnetic disk, an optical disk, a usb disk, or a removable hard disk, the Processor 402 may be a Central Processing Unit (CPU), the Processor 402 may also be other general-purpose processors, Digital Signal Processors (DSP), Application Specific Integrated Circuits (ASIC), Field Programmable Gate Arrays (FPGA) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein, in some embodiments, the processor 402 is configured to run a computer program stored in the memory 403 to perform the following steps:

acquiring an image to be processed, and performing feature extraction on the image to be processed to obtain a feature map; obtaining a coarse classification result corresponding to the image to be processed according to the characteristic diagram; acquiring an attention feature map with a plurality of specific attentions according to the feature map; performing data enhancement operation on the attention feature map to obtain a data enhancement feature map; acquiring a fine classification result corresponding to the image to be processed according to the data enhancement feature map; and determining a classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result.

Optionally, when obtaining the coarse classification result corresponding to the image to be processed according to the feature map, the processor 402 further performs: performing convolution operation on the feature map to generate a first attention map; performing fusion operation on the feature map and the first attention map to obtain a first partial feature map; and acquiring a coarse classification result corresponding to the image to be processed according to the first part of feature mapping image.

Optionally, when obtaining a coarse classification result corresponding to the image to be processed according to the first partial feature map, the processor 402 further performs: carrying out global average pooling operation on the first part of feature maps to generate a first part of feature map tensor after dimension reduction; performing attention normalization constraint on the features of the preset part in the first part of the feature mapping tensor after dimension reduction to obtain a normalized feature mapping tensor; vector splicing is carried out on the normalized feature mapping tensor to generate a first feature matrix; and classifying the image to be processed according to the first feature matrix to obtain a coarse classification result.

Optionally, when the feature map and the first attention map are fused to obtain a first partial feature map, the processor 402 further performs: and performing dot multiplication operation on the feature map and the plurality of channel feature maps of the first attention map according to elements to obtain a plurality of first partial feature maps.

Optionally, when performing a data enhancement operation on the attention feature map to obtain a data enhanced feature map, the processor 402 further performs: selecting a preset channel of the attention feature map to carry out normalization operation to obtain a candidate data enhancement feature map; performing attention area cutting and amplifying operation on the candidate data enhancement feature map to obtain an image obtained after the attention area cutting and amplifying; performing attention area descending operation on the candidate data enhancement feature map to obtain an image with the descending attention area; and inputting the image with the enlarged attention area clipping and the image with the reduced attention area into a residual error network to extract features, and generating a data enhancement feature map.

Optionally, when obtaining the fine classification result corresponding to the image to be processed according to the data enhancement feature map, the processor 402 further performs: acquiring a feature mapping chart corresponding to the data enhancement feature chart; performing convolution operation on the feature map to generate a second attention map; performing fusion operation on the feature mapping chart and the second attention mapping chart to obtain a second partial feature mapping chart; and acquiring a fine classification result corresponding to the image to be processed according to the second part of feature mapping chart.

Optionally, when obtaining a fine classification result corresponding to the image to be processed according to the second partial feature map, the processor 402 further performs: carrying out global average pooling operation on the second part of feature mapping graph to obtain a second part of feature mapping graph tensor after dimension reduction; performing vector splicing on the dimensionality-reduced tensor of the second part of the feature mapping graph to generate a second feature matrix; and classifying the image to be processed according to the second feature matrix to obtain a fine classification result.

Optionally, when performing feature extraction on the image to be processed to obtain a feature map, the processor 402 further performs: and performing feature extraction on the image to be processed through a preset residual error network to obtain a feature map.

Optionally, when obtaining an attention profile having a plurality of specific attentions from the profiles, the processor 402 further performs: and carrying out extrusion and excitation operations on the feature map for multiple times to generate an attention feature map with multiple specific attentions.

Optionally, when determining the classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result, the processor 402 further performs: adding the coarse classification result and the fine classification result to obtain a classification result corresponding to the image to be processed; or setting a first weight value corresponding to the coarse classification result and a second weight value corresponding to the fine classification result; and determining a classification result corresponding to the image to be processed according to the rough classification result, the first weight value, the fine classification result and the second weight value.

In the above embodiments, the descriptions of the embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed description of the image processing method, and are not described herein again.

The embodiment of the application also provides a computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, the computer program comprises program instructions, and a processor executes the program instructions to realize any image processing method provided by the embodiment of the application. For example, the computer program is loaded by a processor and may perform the following steps:

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

The computer-readable storage medium may be an internal storage unit of the image processing apparatus of the foregoing embodiment, such as a hard disk or a memory of the image processing apparatus. The computer-readable storage medium may also be an external storage device of the image processing apparatus, such as a plug-in hard disk provided on the image processing apparatus, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like.

Since the computer program stored in the computer-readable storage medium can execute any image processing method provided in the embodiments of the present application, beneficial effects that can be achieved by any image processing method provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.

It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments. While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and various equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image processing method, comprising:

2. The image processing method according to claim 1, wherein the obtaining of the coarse classification result corresponding to the image to be processed according to the feature map comprises:

performing convolution operation on the feature map to generate a first attention map;

performing fusion operation on the feature map and the first attention map to obtain a first partial feature map;

and acquiring a coarse classification result corresponding to the image to be processed according to the first partial feature mapping image.

3. The image processing method according to claim 2, wherein the obtaining of the coarse classification result corresponding to the image to be processed according to the first partial feature map comprises:

carrying out global average pooling operation on the first part of feature maps to generate a first part of feature map tensor after dimension reduction;

performing attention normalization constraint on the first part of the feature mapping tensor after dimension reduction to obtain a normalized feature mapping tensor;

performing vector splicing on the normalized feature mapping tensor to generate a first feature matrix;

and classifying the images to be processed according to the first feature matrix to obtain a coarse classification result.

4. The image processing method according to claim 2, wherein the fusing the feature map and the first attention map to obtain a first partial feature map comprises:

and performing dot multiplication operation on the feature map and the channel feature maps of the first attention map according to elements to obtain a plurality of first partial feature maps.

5. The image processing method according to claim 1, wherein the performing a data enhancement operation on the attention feature map to obtain a data enhancement feature map comprises:

selecting a preset channel of the attention feature map to carry out normalization operation to obtain a candidate data enhancement feature map;

performing attention area cutting and amplifying operation on the candidate data enhancement feature map to obtain an attention area cut and amplified image;

performing attention area descending operation on the candidate data enhancement feature map to obtain an image with the descending attention area;

and inputting the image with the attention area clipped and amplified and the image with the attention area reduced into a residual error network to extract features, and generating a data enhancement feature map.

6. The image processing method according to claim 1, wherein the obtaining the fine classification result corresponding to the image to be processed according to the data enhancement feature map comprises:

acquiring a feature mapping chart corresponding to the data enhancement feature chart;

performing convolution operation on the feature map to generate a second attention map;

performing fusion operation on the feature mapping chart and the second attention mapping chart to obtain a second partial feature mapping chart;

and acquiring a fine classification result corresponding to the image to be processed according to the second part of feature mapping chart.

7. The image processing method according to claim 6, wherein the obtaining the fine classification result corresponding to the image to be processed according to the second partial feature map comprises:

performing global average pooling operation on the second part of feature mapping graph to obtain a second part of feature mapping graph tensor after dimension reduction;

performing vector splicing on the dimensionality reduced tensor of the second part of the feature mapping graph to generate a second feature matrix;

and classifying the images to be processed according to the second feature matrix to obtain a fine classification result.

8. The image processing method according to any one of claims 1 to 7, wherein the performing feature extraction on the image to be processed to obtain a feature map comprises:

extracting the characteristics of the image to be processed through a preset residual error network to obtain a characteristic diagram;

the obtaining an attention feature map with a plurality of specific attentions according to the feature map comprises:

and carrying out extrusion and excitation operations on the feature map for multiple times to generate an attention feature map with multiple specific attentions.

9. The image processing method according to any one of claims 1 to 7, wherein the determining a classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result comprises:

adding the coarse classification result and the fine classification result to obtain a classification result corresponding to the image to be processed; alternatively, the first and second electrodes may be,

setting a first weight value corresponding to the coarse classification result and a second weight value corresponding to the fine classification result;

and determining a classification result corresponding to the image to be processed according to the rough classification result, the first weight value, the fine classification result and the second weight value.

10. A computer-readable storage medium for storing a computer program which is loaded by a processor to perform the image processing method of any one of claims 1 to 9.