CN111428807A - Image processing method and computer-readable storage medium - Google Patents

Image processing method and computer-readable storage medium Download PDF

Info

Publication number
CN111428807A
CN111428807A CN202010261102.4A CN202010261102A CN111428807A CN 111428807 A CN111428807 A CN 111428807A CN 202010261102 A CN202010261102 A CN 202010261102A CN 111428807 A CN111428807 A CN 111428807A
Authority
CN
China
Prior art keywords
image
feature map
feature
classification result
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010261102.4A
Other languages
Chinese (zh)
Inventor
纪元法
黄铭洁
孙希延
陈小毛
蓝如师
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202010261102.4A priority Critical patent/CN111428807A/en
Publication of CN111428807A publication Critical patent/CN111428807A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • G06T5/70

Abstract

The embodiment of the application discloses an image processing method and a computer readable storage medium, wherein the image to be processed can be obtained, and the image to be processed is subjected to feature extraction to obtain a feature map; obtaining a coarse classification result corresponding to the image to be processed according to the feature map; acquiring an attention feature map with a plurality of specific attentions according to the feature map; performing data enhancement operation on the attention feature map to obtain a data enhancement feature map; acquiring a fine classification result corresponding to the image to be processed according to the data enhancement feature map; and determining a classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result. According to the scheme, a fine classification result is obtained by a data enhancement feature map obtained by performing data enhancement operation on the attention feature map, and a final classification result of the image is determined by combining a rough classification result and the fine classification result, so that the accuracy of image classification is improved.

Description

Image processing method and computer-readable storage medium
Technical Field
The present application relates to the field of computer vision technologies, and in particular, to an image processing method and a computer-readable storage medium.
Background
As the images used as information carriers become more and more abundant, the images can be classified according to actual requirements to determine the categories to which the images belong, for example, the images can be recognized with a large difference, such as the classification of different categories such as people, cars, or dogs; or identifying the subclass category in the large category of the image, such as identifying different birds or identifying different vehicles. Therefore, when the image is finely sub-classified, the image is more focused on tiny and important local features in the image, so that the fine-grained classification difficulty of the image is increased. In the existing image classification method, the precision of the classification result of the fine-grained image is low aiming at the problems that the fine inter-class difference between sub-classes, the large intra-class difference, the dependence on a large amount of manual labeling information, the loss of key features caused by overfitting, the lack of a data set sample, the interference of background noise on the classification of the weakly supervised learning image and the like are caused.
Disclosure of Invention
The embodiment of the application provides an image processing method, an image processing device and a computer readable storage medium, which can improve the accuracy of image classification.
In a first aspect, an embodiment of the present application provides an image processing method, including:
acquiring an image to be processed, and performing feature extraction on the image to be processed to obtain a feature map;
obtaining a coarse classification result corresponding to the image to be processed according to the feature map;
acquiring an attention feature map with a plurality of specific attentions according to the feature map;
performing data enhancement operation on the attention feature map to obtain a data enhancement feature map;
acquiring a fine classification result corresponding to the image to be processed according to the data enhancement feature map;
and determining a classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result.
In a second aspect, an embodiment of the present application further provides an image processing apparatus, including a memory and a processor, where the memory stores a computer program, and the processor executes any one of the image processing methods provided by the embodiments of the present application when calling the computer program in the memory.
In a third aspect, an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, and the computer program is loaded by a processor to execute any one of the image processing methods provided in the embodiment of the present application.
The method and the device can acquire the image to be processed, and perform feature extraction on the image to be processed to obtain a feature map; then, obtaining a rough classification result corresponding to the image to be processed according to the feature map, obtaining an attention feature map with a plurality of specific attentions according to the feature map, performing data enhancement operation on the attention feature map to obtain a data enhancement feature map, and obtaining a fine classification result corresponding to the image to be processed according to the data enhancement feature map; at this time, the classification result corresponding to the image to be processed can be determined based on the coarse classification result and the fine classification result. According to the scheme, a fine classification result is obtained by a data enhancement feature map obtained by performing data enhancement operation on the attention feature map, and a final classification result of the image is determined by combining a rough classification result and the fine classification result, so that the accuracy of image classification is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of an image processing method provided in an embodiment of the present application;
fig. 2 is another schematic flow chart of an image processing method provided in an embodiment of the present application;
fig. 3 is another schematic flow chart of an image processing method provided in an embodiment of the present application;
FIG. 4 is a schematic diagram of feature matrix generation by image fusion and stitching based on BAP according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram illustrating an attention area cropping and enlarging operation performed on an image according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of an attention area dropping operation performed on an image according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Embodiments of the present application provide an image processing method and a computer-readable storage medium. The image processing method can be applied to image processing equipment, and the image processing equipment can comprise a server, a terminal and the like, wherein the terminal can comprise a mobile phone, a computer, a camera and the like.
Referring to fig. 1, fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the present application. The image processing method may include steps S101 to S106, and the like, and specifically may be as follows:
s101, obtaining an image to be processed, and performing feature extraction on the image to be processed to obtain a feature map.
The type and the obtaining mode of the image to be processed can be flexibly set according to actual needs, for example, the image to be processed can contain objects such as people, tables, flowers, trees, birds, dogs or vehicles, the image to be processed can be obtained from an image storage database preset by a server, or the image to be processed can be obtained from a terminal local storage, or the image to be processed is acquired through a camera, and the image acquired by the camera is used as the image to be processed; and so on.
After obtaining the image to be processed, feature extraction may be performed on the image to be processed through a Residual Network (ResNet), a Histogram of Oriented Gradients (HOG), a Convolutional Neural Network (CNN), or the like, so as to obtain a feature map, where the feature map may be a feature map corresponding to a global feature in the image, for example, a feature of an object in the image to be processed may be extracted, so as to obtain a feature map including features of the object, and the object may be flexibly set according to actual needs, for example, the object may be an object such as a person, a table, a flower, a tree, a bird, a dog, or a vehicle.
It should be noted that, in order to improve the accuracy of feature extraction, the image to be processed may be preprocessed, where the preprocessing may include filtering, denoising, or scaling, so as to reduce interference or enhance the definition of the image, and then the preprocessed image is subjected to feature extraction to obtain a feature map.
In some embodiments, the extracting the feature of the image to be processed to obtain the feature map includes: and performing feature extraction on the image to be processed through a preset residual error network to obtain a feature map.
To improve the accuracy of feature extraction, one canThe method comprises the steps of extracting features of an image to be processed through a preset residual error network, wherein the preset residual error network is a trained residual error network, specifically, training sample images of various types can be obtained, the residual error network is trained through the training sample images to obtain the trained residual error network, and the trained residual error network can accurately extract the features of any input imageH×W×NThe specific values of H, W and N can be flexibly set according to actual needs, and are not limited herein.
And S102, acquiring a coarse classification result corresponding to the image to be processed according to the feature map.
After the feature map is obtained, operations such as Bilinear Attention Pooling (BAP) and attention normalization constraint may be further performed to obtain a coarse classification result corresponding to the image to be processed. The rough classification result may be a probability of identifying a small category to which the target object belongs from the data set to be processed (i.e., the image to be processed), such as a probability of identifying a model of the vehicle from the data set of the vehicle.
In some embodiments, obtaining a coarse classification result corresponding to the image to be processed according to the feature map includes: performing convolution operation on the feature map to generate a first attention map; performing fusion operation on the feature map and the first attention map to obtain a first partial feature map; and acquiring a coarse classification result corresponding to the image to be processed according to the first part of feature mapping image.
Specifically, as shown in fig. 2 and 3, a feature map F ∈ R is generated in the image input ResNet networkH×W×NThereafter, the feature map may be subjected to a convolution operation, for example, the feature map F may be subjected to a convolution operation of 1 × 1 (i.e., 1 × 1) to generate the first attention map a1(i.e., A1), the first attention map A1Can refer to a feature map containing attention features, wherein A1∈RH ×W×MFeature map F and first attention map A1All of which are H × W, feature map F has N channels, first attention map A1There are M channels. Then, the feature map F and the first attention map A are compared1Performing a fusion operation, wherein the specific fusion manner is not limited, for example, as shown in fig. 4, the feature map F and the first attention map a may be combined1As the input of BAP, element dot product operation is carried out to obtain the first part feature mapping chart F1kThe first partial feature map F1kMay include a plurality of, the first partial feature maps F1kIt may refer to an image obtained by fusing the attention feature with the global feature (i.e. feature map) to obtain more levels of local features, where the first partial feature map F1kThe characterization capability of the image can be improved. Wherein, the structure of BAP can be as shown in FIG. 4, taking the attention map A in FIG. 4 as the first attention map A1And using the partial feature map FK as the first partial feature map F1kAnd using the feature matrix S as the first feature matrix S1(i.e., S1) for understanding. At this time, the feature map F can be obtained according to the first part1kObtaining a coarse classification result corresponding to the image to be processed, performing convolution operation, fusion operation and the like on the feature map, and mapping F based on the first part of feature map1kObtaining a coarse classification result P1(i.e., P1) the reliability of the coarse classification result acquisition can be improved.
In some embodiments, the fusing the feature map and the first attention map to obtain the first partial feature map includes: and performing dot multiplication operation on the feature map and the plurality of channel feature maps of the first attention map according to elements to obtain a plurality of first partial feature maps.
Wherein the first attention map A1Represents a certain part of a particular object, i.e. an object, and A1={a11,a12,...,a1K,...,a1MF and a first attention map a1A plurality of channel profiles of1kPerforming dot multiplication operation according to elements to obtain a plurality of first part feature maps F1kI.e. first note of the feature map F and each channelIntention map A1Multiplying according to element correspondence to obtain M first part feature mapping images F1kWherein a iskThe kth part of the target object in the image can be reflected, so that the fusion effect is improved.
In some embodiments, obtaining a coarse classification result corresponding to the image to be processed according to the first partial feature map includes: carrying out global average pooling operation on the first part of feature maps to generate a first part of feature map tensor after dimension reduction; carrying out attention normalization constraint on a preset part in the first part of the feature mapping tensor map after dimension reduction to obtain a normalized feature mapping map tensor; vector splicing is carried out on the normalized feature mapping tensor to generate a first feature matrix; and classifying the image to be processed according to the first feature matrix to obtain a coarse classification result.
In order to improve the accuracy of the coarse classification result acquisition, the first partial feature map F may be used1kPerforming a global average pooling operation such that each first partial feature map F1kFinally, dimension reduction is changed into one-dimensional tensor, and the tensor f of the first part of feature mapping image after dimension reduction is obtained1kI.e. generating each first partial feature map F1kAnd the corresponding one-dimensional tensor after dimension reduction. Then, the dimensionality-reduced first partial feature map tensor f1kThe preset part is subjected to attention normalization constraint to punish differences between different features of the same object, for example, the feature representing the kth part is subjected to attention normalization constraint to obtain a normalized feature map tensor, so that each part of the feature map in the normalized feature map tensor is close to the feature center of the part, and the normalized feature map tensor may include multiple parts. At this time, all normalized eigenmap tensors may be vector-stitched to generate the first eigen matrix S1The first feature matrix S1May be an N × M feature matrix, the first feature matrix S1All partial features are included. At this time according to the first feature matrix S1Classifying the image to be processed to obtain a coarse classification result, for example, the first feature matrix S may be used1Input supportClassifying by a vector machine (SVM) classifier or a softmax classifier to obtain a rough classification result P corresponding to the image to be processed1
Specifically, a feature map F is generated after inputting an image into a ResNet network, and the feature map F is subjected to a1 × 1 convolution operation to generate a first attention map A1Then, first, the feature map F and the first attention map a are combined1Performing dot product operation, wherein the specific formula is as follows:
Figure BDA0002439313480000061
wherein the content of the first and second substances,
Figure BDA0002439313480000062
multiplication operation according to element correspondence, F represents a characteristic diagram, a1kShowing a first attention map A1Attention map of the k-th channel, F1kIs a first partial feature map.
Then, the local features with the recognition power are further extracted through a local feature extraction function g (-) by the following specific formula:
f1k=g(F1k) (2)
where g (-) is the global average pooling function, f1k∈R1×NIs the kth partial feature tensor.
From equation (2), each first partial feature map F1kFinally reducing the dimension to one number, and then the first part of feature map F of each group1kThe dimensionality is reduced to a set of one-dimensional tensors, namely the first part of the eigenmap tensor f after the dimensionality reduction1kThen, the dimensionality-reduced first partial feature map tensor f1kNamely, the features representing the kth position are subjected to attention normalization constraint to obtain a normalized feature mapping tensor, and the normalized feature mapping tensor is subjected to vector splicing to form a first feature matrix S1∈RM×NThe first feature matrix S1And inputting the softmax classifier for classification. In summary, S1Can be represented by the following formula:
Figure BDA0002439313480000063
from the above, the feature map F and the first attention map A1After BAP operation, a first feature matrix S is generated1In order to make the features of the same part of the same object as similar as possible and to penalize the difference between different features of the same object, the class center loss can be used to supervise the learning process of attention, for example, the attention center loss function is used to map the tensor f of the feature map of the first part after dimension reduction1kAn attention normalization constraint is performed, wherein the attention center loss function is defined as follows:
Figure BDA0002439313480000071
wherein, LARepresenting a loss of centre of attention, f1kRepresenting the feature of the k-th site in ckThe feature center of the kth part of the representative part is initialized and defined as 0, and then the value of the feature center is updated according to a moving average formula, wherein the specific formula is as follows:
ck←ck+β(f1k-ck) (5)
wherein β controls the global feature center c of the sitekThe learning rate of (1) ensures that each part of the eigen map in the normalized eigen map tensor is close to the eigen center of the part of the eigen map tensor through the attention regularization loss function, namely, each normalized eigen map tensor represents a unique object part, and finally, an eigen matrix S formed by splicing all the normalized eigen map tensors is spliced1Inputting the classifier softmax for classification to obtain a coarse classification result P1
And S103, acquiring an attention feature map with a plurality of specific attentions according to the feature map.
It should be noted that, the execution sequence between step S102 and step S103 may be that step S102 is executed first, and then step S103 is executed; or, step S103 is executed first, and then step S102 is executed; alternatively, step S102 and step S103 are executed simultaneously, and the specific execution order between step S102 and step S103 is not limited herein in this embodiment.
For example, as shown in fig. 2 and 3, after obtaining the feature map F, an attention feature map Aq (i.e., a) having a plurality of specific attentions may be obtained based on the feature map Fq) The attention feature map Aq may be a feature map including a plurality of attention regions. In some embodiments, obtaining an attention profile having a plurality of specific attentions from the profiles comprises: and carrying out extrusion and excitation operations on the feature map for multiple times to generate an attention feature map with multiple specific attentions.
Specifically, the attention feature map may be generated using a multi-excitation pattern, where the extracted feature map F ═ F1,F2,...,Fk,...,FN]∈RW×H×NThen, the F is subjected to One-time extrusion Multi-Excitation (OSME) operation to generate a plurality of attention-specific feature maps AqThe useful information with the authentication portion is made more prominent as much as possible.
Feature map F is first aggregated using a global average pooling squeeze operation with the feature map of spatial dimension W × H to generate a channel-level descriptor Z ═ Z1,z2,...,zk,...,zN]The concrete formula is as follows:
Figure BDA0002439313480000072
wherein, Fk(W, H) represents the element value in the spatial dimension W × H.
Then, an independent door mechanism is applied to each excitation module in Z, and the number of excitation modules is set to q as 1,2, …, q, and the specific formula is as follows:
Figure BDA0002439313480000081
wherein, sigma represents Sigmoid function, represents Re L u function,
Figure BDA0002439313480000082
are weight coefficients.
At this time, the specific attention feature map AqThe channel generation through the re-weighting original feature map F is as follows:
Figure BDA0002439313480000083
the OSME is an attention method for component positioning under weak supervision, and has the function of generating a plurality of feature maps with specific attention, and the weights of the feature maps can be learned through a network according to a loss function, so that the effective feature map has large weight, and the invalid or small-effect feature map has small weight. In other words, the network structure is an attention mechanism in the channel dimension, and different from the adoption of a multi-layer Excitation structure, a plurality of attention structures are generated, namely, a plurality of attention areas are extracted to be transmitted to the later stage for analysis. Specifically, the importance degree of each feature channel is automatically acquired through a learning mode, and then useful features are promoted according to the importance degree and the features which are not useful for the current task are suppressed.
And S104, performing data enhancement operation on the attention feature map to obtain a data enhancement feature map.
Wherein, the data enhancement is an operation of increasing the data training data volume and is used for preventing overfitting and improving the performance of the deep learning network. The data enhancement method can include augmentation by a random method, such as random image clipping, the target image can be randomly clipped during image processing, so that a required target can be clipped with a certain probability, in order to solve the problem that the data enhancement is affected by noise such as background and the like, and the unnecessary target is clipped, the embodiment of the application can generate an attention map used for representing the salient features of the target through weak supervised learning in the training process of a network, and then enhance the data by using guidance data with the target of the attention map, wherein the guidance data comprises the attention clipping, the attention discarding and the like.
In the generation of an attention feature map A containing a plurality of specific attentionsqThen, in order to obtain more fine-grained information with discriminative power, a data enhancement mode can be adopted to extract local fine-grained features so as to improve classification features, and at this time, the attention feature map a can be subjected toqAnd carrying out data enhancement operation to obtain a data enhancement characteristic diagram. The data enhancement operation can improve the significance of key features, reduce the influence of unnecessary features, and further improve the performance.
In some embodiments, performing a data enhancement operation on the attention feature map to obtain a data enhancement feature map includes: selecting a preset channel of the attention feature map to carry out normalization operation to obtain a candidate data enhancement feature map; performing attention area cutting and amplifying operation on the candidate data enhancement feature map to obtain an image obtained after the attention area cutting and amplifying; performing attention area descending operation on the candidate data enhancement feature map to obtain an image with the descending attention area; and (3) up-sampling and amplifying the image with the clipped and amplified attention area and the image with the decreased attention area to the size of a data set picture (namely the size of the image to be processed), and inputting the data into a residual error network to extract features to generate a data enhancement feature map.
Wherein, a preset channel of the attention feature map can be randomly selected to guide the data enhancement process, and the normalization operation is carried out to obtain a candidate data enhancement feature map
Figure BDA0002439313480000091
Enhancing feature maps using candidate data
Figure BDA0002439313480000092
Guiding data enhancement, enhancing the candidate data into a feature map
Figure BDA0002439313480000093
Adding a cropping mask, and enlarging the part with the judging feature to the size of the image data set after cropping, namely cutting and enlarging the kth image data set through the attention areaThe part is enlarged to the same size as the original image (namely the image to be processed), thereby realizing the characteristic map enhancement of the candidate data
Figure BDA0002439313480000094
And performing attention area cutting and amplifying operation to obtain an image obtained by cutting and amplifying the attention area.
And enhancing the candidate data with the feature map
Figure BDA0002439313480000095
Adding a fall mask of the attention area, deleting the part cut and amplified, namely cutting the kth part of the original picture through the fall of the attention area, and realizing the enhancement of the feature map of the candidate data
Figure BDA0002439313480000096
And performing attention area descending operation to obtain the image with the reduced attention area, so that the network can be encouraged to extract other identification parts based on data enhancement operation, and the robustness of image classification and the positioning accuracy can be improved.
Specifically, attention feature map a may be used because it is inefficient to randomly select a portion of an image for enhancement, and particularly when the size of an image slice is small, background noise may be introduced to cause interferenceqTo better filter background noise. The embodiment of the application adopts the randomly selected attention feature map AqOf one of the channels of (1) an attention map AKTo guide the data enhancement process and normalize it, and let the k-th candidate data enhancement feature map be
Figure BDA0002439313480000097
The specific formula of the normalization process is as follows:
Figure BDA0002439313480000098
obtaining candidate data enhanced feature map
Figure BDA0002439313480000099
Then, the interest area can be selected, the area is enlarged, and more detailed local features are extracted. Wherein the idea of cutting the mask is to select a threshold value thetac
Figure BDA00024393134800000910
Pixel value
Figure BDA00024393134800000911
Greater than thetacIs set to a value of 1, less than thetacAnd setting 0. Thus, the region set to 1 is a partial region that needs to be focused. The specific formula for obtaining the cropping mask is as follows:
Figure BDA0002439313480000101
the candidate data is enhanced with a feature map of
Figure BDA0002439313480000102
After the mask cutting operation, a local area is obtained through cutting, the local area is up-sampled and amplified to the size of an original image, namely, the local area is amplified, a more detailed part is extracted, the more detailed part is input into a residual error network ResNet as an enhanced data set to extract more detailed features again, the specific process is shown in FIG. 5, after the image is subjected to feature extraction through a ResNet network, an attention feature map is obtained, then the attention feature map is subjected to cutting and amplifying operations to obtain a local enlarged image, and the feature extraction is performed on the local enlarged image through the ResNet network to obtain a local feature map, wherein the local feature map is the feature map extracted from the image after the attention area is cut and amplified.
Attention regularization loss supervision Each attention map AkRepresenting features of the same kth segment, but different attention diagrams AkSimilar parts may be of interest in order to alleviate multiple attention diagrams AkFocusing on the problem of the same part of an object, the embodiment of the application adopts the candidate data to enhance the feature map
Figure BDA0002439313480000103
Performing attention-fall operations to encourage the model to extract features from the plurality of discriminatory portions, the attention-fall mask being obtained as opposed to the attention-fall mask being clipped, e.g. by selecting a threshold θd
Figure BDA0002439313480000104
Pixel value
Figure BDA0002439313480000105
Greater than thetadIs set to 0 and is smaller than theta d1, placing. The specific formula is as follows:
Figure BDA0002439313480000106
therefore, the operation deletes the region with the attention clipping from the original image, and inputs the rest of the image as a data set into the residual error network ResNet to extract other features, the specific process is as shown in FIG. 6, and after the image is subjected to feature extraction through the ResNet network in FIG. 6, an attention feature map is obtained, then an attention mask operation is performed on the attention feature map, namely, an attention region reduction operation is performed to obtain a local deletion map, and the local deletion map is subjected to feature extraction through the ResNet network to obtain an image extracted feature map with the attention region reduced.
And S105, acquiring a fine classification result corresponding to the image to be processed according to the data enhancement feature map.
The fine classification result may be a probability of identifying a subclass to which the target object belongs from the image to be processed. After the data enhancement feature map is obtained, BAP operation can be performed on the data enhancement feature map, and a fine classification result corresponding to the image to be processed is obtained.
In some embodiments, obtaining a fine classification result corresponding to the image to be processed according to the data enhancement feature map includes: acquiring a feature mapping chart corresponding to the data enhancement feature chart; performing convolution operation on the feature map to generate a second attention map; performing fusion operation on the feature mapping chart and the second attention mapping chart to obtain a second partial feature mapping chart; and acquiring a fine classification result corresponding to the image to be processed according to the second part of feature mapping chart.
Specifically, as shown in fig. 2 and 3, the feature map obtained by extracting features from the image with the enlarged attention region and the image with the reduced attention region after the data enhancement operation is the data enhancement feature map, at this time, the data enhancement feature map may be input into the residual error network ResNet as a data set to extract features of a deeper layer, so as to obtain the feature map T. The feature map T may then be convolved, e.g., the feature map T may be subjected to a1 x 1 convolution operation to generate a second attention map a having a plurality of portions2The second attention map A1May refer to a feature map, A, containing attention features after a data enhancement operation2∈RH×W×MA feature map T and a second attention map A2As input to BAP, to perform operations such as element dot multiplication, pooling, and vector stitching.
Next, the feature map T and the second attention map A are mapped2The fusion operation is performed to fuse the key features as much as possible, and the specific fusion manner is not limited, for example, as shown in fig. 4, the feature map T and the second attention map a may be combined2As the input of BAP, performing dot product operation to obtain a second partial feature map F2kThe second partial feature map F2kThe number of the feature maps can be multiple, and the second partial feature map can refer to an image obtained by fusing the attention feature subjected to data enhancement with the global feature (namely, the data enhancement feature map) of the attention feature to obtain a more hierarchical local feature. Wherein A is2={a21,a22,...,a2K,...,a2MThe feature map T and the second attention map A may be mapped2A plurality of channel profiles of2kPerforming dot multiplication operation according to elements to obtain a plurality of second partial feature maps F2kSo that the second attention map A can be mapped2Fused with the feature map T to extract more fine particlesSecond partial feature map F of degree features2k
Wherein, the structure of BAP can be as shown in FIG. 4, taking the attention map A in FIG. 4 as the second attention map A2And using the partial feature map FK as a second partial feature map F2kAnd using the feature matrix S as a second feature matrix S2(i.e., S2) for understanding. At this time, the feature map F can be obtained according to the second part2kObtaining a fine classification result corresponding to the image to be processed, performing convolution operation, fusion operation and the like on the feature map T, and based on the second part of feature map F2kObtaining a fine classification result P2(i.e., P2) the accuracy and reliability of the fine classification result acquisition can be improved.
In some embodiments, obtaining a fine classification result corresponding to the image to be processed according to the second partial feature map includes: carrying out global average pooling operation on the second part of feature mapping graph to obtain a second part of feature mapping graph tensor after dimension reduction; performing vector splicing on the dimensionality-reduced tensor of the second part of the feature mapping graph to generate a second feature matrix; and classifying the image to be processed according to the second feature matrix to obtain a fine classification result.
In order to improve the accuracy of the fine classification result acquisition, the second partial feature map F may be used2kPerforming global average pooling operation to make each second partial feature map F2kFinally, dimension reduction is changed into one-dimensional tensor, and the tensor f of the second part of the feature mapping image after dimension reduction is obtained2kI.e. generating each second partial feature map F2kA corresponding reduced-dimension one-dimensional tensor, the reduced-dimension second partial eigenmap tensor f2kMay comprise a plurality of sheets. In this case, all the reduced second partial eigenmap tensors f may be used2kPerforming vector splicing to generate a second feature matrix S2The second feature matrix S2May be an N × M feature matrix and then based on a second feature matrix S2Classifying the image to be processed to obtain a coarse classification result, for example, the second feature matrix S may be used2Input SVM classifier or softmax classifier, etcClassifying to obtain a fine classification result P corresponding to the image to be processed2
Specifically, a feature map T and a second attention map A are obtained2Then, first, the feature map T and the second attention map A are mapped2Performing dot product operation, wherein the specific formula is as follows:
Figure BDA0002439313480000121
wherein the content of the first and second substances,
Figure BDA0002439313480000122
multiplication operation by element correspondence, T denotes a feature map, a2kShowing a second attention map A2Attention map of the k-th channel, F2kIs a second partial feature map.
Then, the local features with the recognition power are further extracted through a local feature extraction function g (-) by the following specific formula:
f2k=g(F2k) (13)
where g (-) is the global average pooling function, f2k∈R1×NIs the kth partial eigenmap tensor.
From equation (2), each second partial feature map F2kFinally reducing the dimension to one number, and then the second part of feature mapping chart F of each group2kReducing the dimensionality into a group of one-dimensional tensors, namely the tensor f of the feature mapping image of the second part after the dimensionality reduction2kThen, the dimensionality-reduced second partial feature map tensor f2kVector splicing is carried out, and a second feature matrix S can be formed2∈RM×NSecond feature matrix S2And inputting the softmax classifier for classification. In summary, S2Can be represented by the following formula:
Figure BDA0002439313480000123
from the above, the feature map F and the secondAttention map A2After BAP operation, a second feature matrix S is generated2. Finally, the second feature matrix S2Inputting the classifier softmax for classification to obtain a fine classification result P2
In the embodiment, the attention mechanism-based OSME is adopted and the data enhancement operation is combined, so that more subtle distinguishable images are added to the data set, and the BAP is utilized to fuse more layers of features together, so that the network can focus on the part of important features in the images under the condition of not additionally marking information, the identification accuracy is greatly improved, the accuracy of obtaining a fine classification result is improved, and the classification precision is effectively improved. The method specifically comprises a ResNet extraction feature network, a BAP, attention normalization constraint, data enhancement operation and the like, wherein ResNet is used as an extraction feature network structure of an algorithm, an attention mapping map is generated by the extracted feature map, BAP is used for taking the feature map and the attention mapping map as input, point multiplication, pooling, vector splicing and other operations are carried out, and features of different layers are obtained to enhance local features; attention normalization constraint is to monitor the learning process of attention by adopting a class center loss function algorithm, so that the characteristics of the same part on the same object are similar as much as possible; the data enhancement operation is guided by an attention mechanism, the attention area is cut and amplified, and the attention area is reduced, so that the model focuses more on fine-grained characteristics of the image, the interference of background noise is reduced, and the identification precision is further improved.
The attention mechanism aims to enable the network to pay more attention to relevant parts in input, and is different from the traditional attention mechanism SEnet. The data enhancement operation overcomes the defects that the classification training data of the weakly supervised fine grained image is limited and the precision can be improved only by professional knowledge and a large amount of time marking, the background noise irrelevant to the classification and identification of the image is abandoned by cutting and amplifying the attention area so as to enhance the appearance of local features, and the features are extracted from a plurality of judgment parts by an excitation model by descending the attention area so as to further improve the precision. The BAP algorithm fuses the attention features and the global features to obtain more levels of features, pertinently enhances the image, enhances the significance of local features having discrimination function on fine-grained classification tasks, and improves the classification precision. Attention normalization constraint punishs the difference between different features of the same object, so that the features of the same part on the same species are similar as much as possible, and the classification precision of fine-grained images can be obviously improved.
And S106, determining a classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result.
In some embodiments, determining the classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result includes: and adding the coarse classification result and the fine classification result to obtain a classification result corresponding to the image to be processed.
After obtaining the coarse classification result and the fine classification result, in order to improve the convenience of determining the classification result, the coarse classification result P may be used1And a fine classification result P2Adding the obtained results to obtain a classification result P which corresponds to the image to be processed1+P2. The classification result may be a classification result corresponding to an object in the image to be processed, and the classification result corresponding to the object may be a sub-category corresponding to a highest probability of the sub-category to which the object belongs, as the category of the object.
In some embodiments, determining the classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result includes: setting a first weight value corresponding to the coarse classification result and a second weight value corresponding to the fine classification result; and determining a classification result corresponding to the image to be processed according to the rough classification result, the first weight value, the fine classification result and the second weight value.
To improve the flexibility and reliability of classification result determination, a coarse classification result P may be set1To a corresponding secondA weighted value C1And a fine classification result P2Corresponding second weight value C2Then the coarse classification result P1And a first weight value C1Multiplying to obtain a first value P1*C1And classifying the result P of the subdivision2And a second weight value C2Multiplying to obtain a second value P2*C2At this time, the first value P may be set1*C1And a second value P2*C2Taking the sum as a classification result P ═ P corresponding to the image to be processed1*C1+P2*C2
The method and the device can acquire the image to be processed, and perform feature extraction on the image to be processed to obtain a feature map; then, obtaining a rough classification result corresponding to the image to be processed according to the feature map, obtaining an attention feature map with a plurality of specific attentions according to the feature map, performing data enhancement operation on the attention feature map to obtain a data enhancement feature map, and obtaining a fine classification result corresponding to the image to be processed according to the data enhancement feature map; at this time, the classification result corresponding to the image to be processed can be determined based on the coarse classification result and the fine classification result. According to the scheme, a fine classification result is obtained by a data enhancement feature map obtained by performing data enhancement operation on the attention feature map, and a final classification result of the image is determined by combining a rough classification result and the fine classification result, so that the accuracy of image classification is improved.
In order to better implement the image processing method provided by the embodiment of the present application, the embodiment of the present application further provides an apparatus based on the image processing method. The terms are the same as those in the image processing method, and details of implementation can be referred to the description in the method embodiment.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure, wherein the image processing apparatus 300 may include an extraction module 301, a first obtaining module 302, a second obtaining module 303, a data enhancement module 304, a third obtaining module 305, a determination module 306, and the like.
The extraction module 301 is configured to acquire an image to be processed, and perform feature extraction on the image to be processed to obtain a feature map.
The first obtaining module 302 is configured to obtain a coarse classification result corresponding to the image to be processed according to the feature map.
A second obtaining module 303, configured to obtain an attention feature map with a plurality of specific attentions according to the feature map.
And the data enhancement module 304 is configured to perform data enhancement operation on the attention feature map to obtain a data enhancement feature map.
And a third obtaining module 305, configured to obtain a fine classification result corresponding to the image to be processed according to the data enhancement feature map.
And the determining module 306 is configured to determine a classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result.
Optionally, the first obtaining module 302 includes:
the first convolution submodule is used for performing convolution operation on the feature map to generate a first attention mapping map;
the first fusion submodule is used for carrying out fusion operation on the feature map and the first attention map to obtain a first part of feature map;
and the first obtaining submodule is used for obtaining a coarse classification result corresponding to the image to be processed according to the first partial feature mapping image.
Optionally, the first obtaining sub-module is specifically configured to: carrying out global average pooling operation on the first part of feature maps to generate a first part of feature map tensor after dimension reduction; performing attention normalization constraint on the features of the preset part in the first part of the feature mapping tensor after dimension reduction to obtain a normalized feature mapping tensor; vector splicing is carried out on the normalized feature mapping tensor to generate a first feature matrix; and classifying the image to be processed according to the first feature matrix to obtain a coarse classification result.
Optionally, the first fusion submodule is specifically configured to: and performing dot multiplication operation on the feature map and the plurality of channel feature maps of the first attention map according to elements to obtain a plurality of first partial feature maps.
Optionally, the data enhancement module 304 is specifically configured to: selecting a preset channel of the attention feature map to carry out normalization operation to obtain a candidate data enhancement feature map; performing attention area cutting and amplifying operation on the candidate data enhancement feature map to obtain an image obtained after the attention area cutting and amplifying; performing attention area descending operation on the candidate data enhancement feature map to obtain an image with the descending attention area; and inputting the image with the enlarged attention area clipping and the image with the reduced attention area into a residual error network to extract features, and generating a data enhancement feature map.
Optionally, the third obtaining module 305 includes:
the second obtaining submodule is used for obtaining a feature mapping chart corresponding to the data enhancement feature chart;
the second convolution submodule is used for carrying out convolution operation on the feature mapping chart to generate a second attention mapping chart;
the second fusion submodule is used for carrying out fusion operation on the feature mapping chart and the second attention mapping chart to obtain a second part of feature mapping chart;
and the third obtaining submodule is used for obtaining a fine classification result corresponding to the image to be processed according to the second partial feature mapping image.
Optionally, the third obtaining sub-module is specifically configured to: carrying out global average pooling operation on the second part of feature mapping graph to obtain a second part of feature mapping graph tensor after dimension reduction; performing vector splicing on the dimensionality-reduced tensor of the second part of the feature mapping graph to generate a second feature matrix; and classifying the image to be processed according to the second feature matrix to obtain a fine classification result.
Optionally, the extracting module 301 is specifically configured to: and performing feature extraction on the image to be processed through a preset residual error network to obtain a feature map.
Optionally, the second obtaining module 303 is specifically configured to: and carrying out extrusion and excitation operations on the feature map for multiple times to generate an attention feature map with multiple specific attentions.
Optionally, the determining module 306 is specifically configured to: adding the coarse classification result and the fine classification result to obtain a classification result corresponding to the image to be processed; or setting a first weight value corresponding to the rough classification result and a second weight value corresponding to the fine classification result, and determining the classification result corresponding to the image to be processed according to the rough classification result, the first weight value, the fine classification result and the second weight value.
In the embodiment of the application, the extraction module 301 may acquire an image to be processed, and perform feature extraction on the image to be processed to obtain a feature map; then, a first obtaining module 302 obtains a rough classification result corresponding to the image to be processed according to the feature map, a second obtaining module 303 obtains an attention feature map with a plurality of specific attentions according to the feature map, a data enhancement module 304 performs data enhancement operation on the attention feature map to obtain a data enhancement feature map, and a third obtaining module 305 obtains a fine classification result corresponding to the image to be processed according to the data enhancement feature map; at this time, the determination module 306 may determine a classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result. According to the scheme, a fine classification result is obtained by a data enhancement feature map obtained by performing data enhancement operation on the attention feature map, and a final classification result of the image is determined by combining a rough classification result and the fine classification result, so that the accuracy of image classification is improved.
Referring to fig. 8, fig. 8 is a schematic block diagram of a structure of an image processing apparatus according to an embodiment of the present application.
As shown in fig. 8, the image processing apparatus 400 may include a processor 402, a memory 403, and a communication interface 404 connected by a system bus 401, wherein the memory 403 may include a nonvolatile computer-readable storage medium and an internal memory.
The non-transitory computer readable storage medium may store a computer program. The computer program comprises program instructions which, when executed, cause a processor to perform any of the image processing methods.
The processor 402 is used to provide computational and control capabilities to support the operation of the overall image processing apparatus.
The memory 403 provides an environment for the execution of a computer program in a non-transitory computer readable storage medium, which when executed by the processor 402, causes the processor 402 to perform any of the image processing methods.
The communication interface 404 is used for communication. Those skilled in the art will appreciate that the structure shown in fig. 8 is a block diagram of only a part of the structure related to the present application, and does not constitute a limitation to the image processing apparatus 400 to which the present application is applied, and a specific image processing apparatus 400 may include more or less components than those shown in the figure, or combine some components, or have a different arrangement of components.
It should be understood that the bus 401 is, for example, an I2C (Inter-Integrated Circuit) bus, the Memory 403 may be a Flash chip, a Read-Only Memory (ROM), a magnetic disk, an optical disk, a usb disk, or a removable hard disk, the Processor 402 may be a Central Processing Unit (CPU), the Processor 402 may also be other general-purpose processors, Digital Signal Processors (DSP), Application Specific Integrated Circuits (ASIC), Field Programmable Gate Arrays (FPGA) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Wherein, in some embodiments, the processor 402 is configured to run a computer program stored in the memory 403 to perform the following steps:
acquiring an image to be processed, and performing feature extraction on the image to be processed to obtain a feature map; obtaining a coarse classification result corresponding to the image to be processed according to the characteristic diagram; acquiring an attention feature map with a plurality of specific attentions according to the feature map; performing data enhancement operation on the attention feature map to obtain a data enhancement feature map; acquiring a fine classification result corresponding to the image to be processed according to the data enhancement feature map; and determining a classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result.
Optionally, when obtaining the coarse classification result corresponding to the image to be processed according to the feature map, the processor 402 further performs: performing convolution operation on the feature map to generate a first attention map; performing fusion operation on the feature map and the first attention map to obtain a first partial feature map; and acquiring a coarse classification result corresponding to the image to be processed according to the first part of feature mapping image.
Optionally, when obtaining a coarse classification result corresponding to the image to be processed according to the first partial feature map, the processor 402 further performs: carrying out global average pooling operation on the first part of feature maps to generate a first part of feature map tensor after dimension reduction; performing attention normalization constraint on the features of the preset part in the first part of the feature mapping tensor after dimension reduction to obtain a normalized feature mapping tensor; vector splicing is carried out on the normalized feature mapping tensor to generate a first feature matrix; and classifying the image to be processed according to the first feature matrix to obtain a coarse classification result.
Optionally, when the feature map and the first attention map are fused to obtain a first partial feature map, the processor 402 further performs: and performing dot multiplication operation on the feature map and the plurality of channel feature maps of the first attention map according to elements to obtain a plurality of first partial feature maps.
Optionally, when performing a data enhancement operation on the attention feature map to obtain a data enhanced feature map, the processor 402 further performs: selecting a preset channel of the attention feature map to carry out normalization operation to obtain a candidate data enhancement feature map; performing attention area cutting and amplifying operation on the candidate data enhancement feature map to obtain an image obtained after the attention area cutting and amplifying; performing attention area descending operation on the candidate data enhancement feature map to obtain an image with the descending attention area; and inputting the image with the enlarged attention area clipping and the image with the reduced attention area into a residual error network to extract features, and generating a data enhancement feature map.
Optionally, when obtaining the fine classification result corresponding to the image to be processed according to the data enhancement feature map, the processor 402 further performs: acquiring a feature mapping chart corresponding to the data enhancement feature chart; performing convolution operation on the feature map to generate a second attention map; performing fusion operation on the feature mapping chart and the second attention mapping chart to obtain a second partial feature mapping chart; and acquiring a fine classification result corresponding to the image to be processed according to the second part of feature mapping chart.
Optionally, when obtaining a fine classification result corresponding to the image to be processed according to the second partial feature map, the processor 402 further performs: carrying out global average pooling operation on the second part of feature mapping graph to obtain a second part of feature mapping graph tensor after dimension reduction; performing vector splicing on the dimensionality-reduced tensor of the second part of the feature mapping graph to generate a second feature matrix; and classifying the image to be processed according to the second feature matrix to obtain a fine classification result.
Optionally, when performing feature extraction on the image to be processed to obtain a feature map, the processor 402 further performs: and performing feature extraction on the image to be processed through a preset residual error network to obtain a feature map.
Optionally, when obtaining an attention profile having a plurality of specific attentions from the profiles, the processor 402 further performs: and carrying out extrusion and excitation operations on the feature map for multiple times to generate an attention feature map with multiple specific attentions.
Optionally, when determining the classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result, the processor 402 further performs: adding the coarse classification result and the fine classification result to obtain a classification result corresponding to the image to be processed; or setting a first weight value corresponding to the coarse classification result and a second weight value corresponding to the fine classification result; and determining a classification result corresponding to the image to be processed according to the rough classification result, the first weight value, the fine classification result and the second weight value.
In the above embodiments, the descriptions of the embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed description of the image processing method, and are not described herein again.
The embodiment of the application also provides a computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, the computer program comprises program instructions, and a processor executes the program instructions to realize any image processing method provided by the embodiment of the application. For example, the computer program is loaded by a processor and may perform the following steps:
acquiring an image to be processed, and performing feature extraction on the image to be processed to obtain a feature map; obtaining a coarse classification result corresponding to the image to be processed according to the characteristic diagram; acquiring an attention feature map with a plurality of specific attentions according to the feature map; performing data enhancement operation on the attention feature map to obtain a data enhancement feature map; acquiring a fine classification result corresponding to the image to be processed according to the data enhancement feature map; and determining a classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
The computer-readable storage medium may be an internal storage unit of the image processing apparatus of the foregoing embodiment, such as a hard disk or a memory of the image processing apparatus. The computer-readable storage medium may also be an external storage device of the image processing apparatus, such as a plug-in hard disk provided on the image processing apparatus, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like.
Since the computer program stored in the computer-readable storage medium can execute any image processing method provided in the embodiments of the present application, beneficial effects that can be achieved by any image processing method provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments. While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and various equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. An image processing method, comprising:
acquiring an image to be processed, and performing feature extraction on the image to be processed to obtain a feature map;
obtaining a coarse classification result corresponding to the image to be processed according to the feature map;
acquiring an attention feature map with a plurality of specific attentions according to the feature map;
performing data enhancement operation on the attention feature map to obtain a data enhancement feature map;
acquiring a fine classification result corresponding to the image to be processed according to the data enhancement feature map;
and determining a classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result.
2. The image processing method according to claim 1, wherein the obtaining of the coarse classification result corresponding to the image to be processed according to the feature map comprises:
performing convolution operation on the feature map to generate a first attention map;
performing fusion operation on the feature map and the first attention map to obtain a first partial feature map;
and acquiring a coarse classification result corresponding to the image to be processed according to the first partial feature mapping image.
3. The image processing method according to claim 2, wherein the obtaining of the coarse classification result corresponding to the image to be processed according to the first partial feature map comprises:
carrying out global average pooling operation on the first part of feature maps to generate a first part of feature map tensor after dimension reduction;
performing attention normalization constraint on the first part of the feature mapping tensor after dimension reduction to obtain a normalized feature mapping tensor;
performing vector splicing on the normalized feature mapping tensor to generate a first feature matrix;
and classifying the images to be processed according to the first feature matrix to obtain a coarse classification result.
4. The image processing method according to claim 2, wherein the fusing the feature map and the first attention map to obtain a first partial feature map comprises:
and performing dot multiplication operation on the feature map and the channel feature maps of the first attention map according to elements to obtain a plurality of first partial feature maps.
5. The image processing method according to claim 1, wherein the performing a data enhancement operation on the attention feature map to obtain a data enhancement feature map comprises:
selecting a preset channel of the attention feature map to carry out normalization operation to obtain a candidate data enhancement feature map;
performing attention area cutting and amplifying operation on the candidate data enhancement feature map to obtain an attention area cut and amplified image;
performing attention area descending operation on the candidate data enhancement feature map to obtain an image with the descending attention area;
and inputting the image with the attention area clipped and amplified and the image with the attention area reduced into a residual error network to extract features, and generating a data enhancement feature map.
6. The image processing method according to claim 1, wherein the obtaining the fine classification result corresponding to the image to be processed according to the data enhancement feature map comprises:
acquiring a feature mapping chart corresponding to the data enhancement feature chart;
performing convolution operation on the feature map to generate a second attention map;
performing fusion operation on the feature mapping chart and the second attention mapping chart to obtain a second partial feature mapping chart;
and acquiring a fine classification result corresponding to the image to be processed according to the second part of feature mapping chart.
7. The image processing method according to claim 6, wherein the obtaining the fine classification result corresponding to the image to be processed according to the second partial feature map comprises:
performing global average pooling operation on the second part of feature mapping graph to obtain a second part of feature mapping graph tensor after dimension reduction;
performing vector splicing on the dimensionality reduced tensor of the second part of the feature mapping graph to generate a second feature matrix;
and classifying the images to be processed according to the second feature matrix to obtain a fine classification result.
8. The image processing method according to any one of claims 1 to 7, wherein the performing feature extraction on the image to be processed to obtain a feature map comprises:
extracting the characteristics of the image to be processed through a preset residual error network to obtain a characteristic diagram;
the obtaining an attention feature map with a plurality of specific attentions according to the feature map comprises:
and carrying out extrusion and excitation operations on the feature map for multiple times to generate an attention feature map with multiple specific attentions.
9. The image processing method according to any one of claims 1 to 7, wherein the determining a classification result corresponding to the image to be processed based on the coarse classification result and the fine classification result comprises:
adding the coarse classification result and the fine classification result to obtain a classification result corresponding to the image to be processed; alternatively, the first and second electrodes may be,
setting a first weight value corresponding to the coarse classification result and a second weight value corresponding to the fine classification result;
and determining a classification result corresponding to the image to be processed according to the rough classification result, the first weight value, the fine classification result and the second weight value.
10. A computer-readable storage medium for storing a computer program which is loaded by a processor to perform the image processing method of any one of claims 1 to 9.
CN202010261102.4A 2020-04-03 2020-04-03 Image processing method and computer-readable storage medium Withdrawn CN111428807A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010261102.4A CN111428807A (en) 2020-04-03 2020-04-03 Image processing method and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010261102.4A CN111428807A (en) 2020-04-03 2020-04-03 Image processing method and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN111428807A true CN111428807A (en) 2020-07-17

Family

ID=71555732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010261102.4A Withdrawn CN111428807A (en) 2020-04-03 2020-04-03 Image processing method and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN111428807A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149720A (en) * 2020-09-09 2020-12-29 南京信息工程大学 Fine-grained vehicle type identification method
CN112465709A (en) * 2020-10-26 2021-03-09 华为技术有限公司 Image enhancement method, device, storage medium and equipment
CN112651451A (en) * 2020-12-30 2021-04-13 北京百度网讯科技有限公司 Image recognition method and device, electronic equipment and storage medium
CN112819057A (en) * 2021-01-25 2021-05-18 长春迈克赛德医疗科技有限公司 Automatic identification method of urinary sediment image
CN112818832A (en) * 2021-01-28 2021-05-18 中国科学技术大学 Weak supervision object positioning device and method based on component perception
CN113139542A (en) * 2021-04-28 2021-07-20 北京百度网讯科技有限公司 Target detection method, device, equipment and computer readable storage medium
CN113361636A (en) * 2021-06-30 2021-09-07 山东建筑大学 Image classification method, system, medium and electronic device
CN113393388A (en) * 2021-05-26 2021-09-14 联合汽车电子有限公司 Image enhancement method, device adopting same, storage medium and vehicle
CN113723407A (en) * 2021-11-01 2021-11-30 深圳思谋信息科技有限公司 Image classification and identification method and device, computer equipment and storage medium
CN113793393A (en) * 2021-09-28 2021-12-14 中国人民解放军国防科技大学 Attention mechanism-based unmanned vehicle multi-resolution video generation method and device
CN115423805A (en) * 2022-11-03 2022-12-02 河南洋荣服饰有限公司 Automatic trousers piece clamping device
CN116738296A (en) * 2023-08-14 2023-09-12 大有期货有限公司 Comprehensive intelligent monitoring system for machine room conditions
CN112101438B (en) * 2020-09-08 2024-04-16 南方科技大学 Left-right eye classification method, device, server and storage medium

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101438B (en) * 2020-09-08 2024-04-16 南方科技大学 Left-right eye classification method, device, server and storage medium
CN112149720A (en) * 2020-09-09 2020-12-29 南京信息工程大学 Fine-grained vehicle type identification method
CN112465709A (en) * 2020-10-26 2021-03-09 华为技术有限公司 Image enhancement method, device, storage medium and equipment
CN112465709B (en) * 2020-10-26 2024-04-12 华为技术有限公司 Image enhancement method, device, storage medium and equipment
CN112651451A (en) * 2020-12-30 2021-04-13 北京百度网讯科技有限公司 Image recognition method and device, electronic equipment and storage medium
CN112651451B (en) * 2020-12-30 2023-08-11 北京百度网讯科技有限公司 Image recognition method, device, electronic equipment and storage medium
CN112819057A (en) * 2021-01-25 2021-05-18 长春迈克赛德医疗科技有限公司 Automatic identification method of urinary sediment image
CN112818832A (en) * 2021-01-28 2021-05-18 中国科学技术大学 Weak supervision object positioning device and method based on component perception
CN113139542A (en) * 2021-04-28 2021-07-20 北京百度网讯科技有限公司 Target detection method, device, equipment and computer readable storage medium
CN113139542B (en) * 2021-04-28 2023-08-11 北京百度网讯科技有限公司 Object detection method, device, equipment and computer readable storage medium
CN113393388A (en) * 2021-05-26 2021-09-14 联合汽车电子有限公司 Image enhancement method, device adopting same, storage medium and vehicle
CN113361636A (en) * 2021-06-30 2021-09-07 山东建筑大学 Image classification method, system, medium and electronic device
CN113361636B (en) * 2021-06-30 2022-09-20 山东建筑大学 Image classification method, system, medium and electronic device
CN113793393B (en) * 2021-09-28 2023-05-09 中国人民解放军国防科技大学 Unmanned vehicle multi-resolution video generation method and device based on attention mechanism
CN113793393A (en) * 2021-09-28 2021-12-14 中国人民解放军国防科技大学 Attention mechanism-based unmanned vehicle multi-resolution video generation method and device
CN113723407A (en) * 2021-11-01 2021-11-30 深圳思谋信息科技有限公司 Image classification and identification method and device, computer equipment and storage medium
CN115423805A (en) * 2022-11-03 2022-12-02 河南洋荣服饰有限公司 Automatic trousers piece clamping device
CN116738296A (en) * 2023-08-14 2023-09-12 大有期货有限公司 Comprehensive intelligent monitoring system for machine room conditions
CN116738296B (en) * 2023-08-14 2024-04-02 大有期货有限公司 Comprehensive intelligent monitoring system for machine room conditions

Similar Documents

Publication Publication Date Title
CN111428807A (en) Image processing method and computer-readable storage medium
CN111080628B (en) Image tampering detection method, apparatus, computer device and storage medium
CN105144239B (en) Image processing apparatus, image processing method
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
Ko et al. Object-of-interest image segmentation based on human attention and semantic region clustering
Kao et al. Visual aesthetic quality assessment with a regression model
CN107203775B (en) Image classification method, device and equipment
CN111814810A (en) Image recognition method and device, electronic equipment and storage medium
CN108305260B (en) Method, device and equipment for detecting angular points in image
CN110717534A (en) Target classification and positioning method based on network supervision
CN113762138B (en) Identification method, device, computer equipment and storage medium for fake face pictures
CN110503103B (en) Character segmentation method in text line based on full convolution neural network
CN107392183B (en) Face classification recognition method and device and readable storage medium
WO2020077940A1 (en) Method and device for automatic identification of labels of image
CN110245621B (en) Face recognition device, image processing method, feature extraction model, and storage medium
CN113936195B (en) Sensitive image recognition model training method and device and electronic equipment
CN111860309A (en) Face recognition method and system
CN109784171A (en) Car damage identification method for screening images, device, readable storage medium storing program for executing and server
CN114037640A (en) Image generation method and device
CN112489063A (en) Image segmentation method, and training method and device of image segmentation model
CN115578616A (en) Training method, segmentation method and device of multi-scale object instance segmentation model
CN116168017A (en) Deep learning-based PCB element detection method, system and storage medium
CN116206334A (en) Wild animal identification method and device
CN114639101A (en) Emulsion droplet identification system, method, computer equipment and storage medium
Bergler et al. FIN-PRINT a fully-automated multi-stage deep-learning-based framework for the individual recognition of killer whales

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200717