CN112580614A - Hand-drawn sketch identification method based on attention mechanism - Google Patents

Hand-drawn sketch identification method based on attention mechanism Download PDF

Info

Publication number
CN112580614A
CN112580614A CN202110210499.9A CN202110210499A CN112580614A CN 112580614 A CN112580614 A CN 112580614A CN 202110210499 A CN202110210499 A CN 202110210499A CN 112580614 A CN112580614 A CN 112580614A
Authority
CN
China
Prior art keywords
attention
hand
drawn sketch
feature map
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110210499.9A
Other languages
Chinese (zh)
Other versions
CN112580614B (en
Inventor
郑影
章依依
徐晓刚
王军
何鹏飞
曹卫强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202110210499.9A priority Critical patent/CN112580614B/en
Publication of CN112580614A publication Critical patent/CN112580614A/en
Application granted granted Critical
Publication of CN112580614B publication Critical patent/CN112580614B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/36Matching; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/333Preprocessing; Feature extraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a hand-drawn sketch recognition method based on an attention mechanism, which comprises the steps of inputting an original hand-drawn sketch into a deep convolutional neural network to obtain a characteristic diagram output by a last convolutional layer; inputting the feature map into a channel attention module to obtain a feature map optimized based on channel attention; training a classification network for predicting vertical turning of the hand-drawn sketch, and inputting the original hand-drawn sketch into the trained classification network to obtain a vertical turning space attention diagram; calculating to obtain a feature map after the attention optimization of the vertical turnover space based on the feature map after the attention optimization of the channel and the attention map of the vertical turnover space; and finally, outputting the identification result through the full connection layer. The invention has the advantages that: the characteristics of the convolutional neural network are optimized by adopting the channel attention and the vertical turnover space attention, so that the network can be focused on a part with more discrimination in learning, and the identification precision of the hand-drawn sketch is effectively improved.

Description

Hand-drawn sketch identification method based on attention mechanism
Technical Field
The invention belongs to the field of computer vision, relates to a hand-drawn sketch classification task, and particularly relates to a hand-drawn sketch identification method based on an attention mechanism.
Background
The hand-drawn sketch can be regarded as an abstract form on a two-dimensional plane, which not only shows the information to be expressed, but also contains an infinite imaginary space. The system can be conveniently used for drawing objects or scenes, outlining story lines, designing products or buildings and the like, and is widely applied to drawing and designing works, such as cartoon making, city planning and design, building composition, industrial design, clothing design and the like. The related art of hand-drawn sketch recognition can be applied to various fields of computer vision, such as image retrieval and generation, 3D graphic retrieval and reconstruction, and the like, and thus has received increasing attention in recent years.
One major difference in hand-drawn sketch recognition compared to the general object recognition task is that the hand-drawn sketch lacks prominent color and texture information. Furthermore, the lines of the hand-drawn sketch also have obvious shape changes and high abstraction in drawing objects, which makes the task of recognition of the hand-drawn sketch extremely challenging. Early hand-drawn sketch recognition research mainly focuses on designing manual features under a traditional object recognition framework, and although certain achievements are achieved, a very large promotion space still exists in recognition performance. In recent years, methods based on deep learning are widely used in recognition tasks of hand-drawn sketches. However, the high abstraction of the freehand sketch makes it difficult to effectively model the freehand sketch, which sharply reduces the recognition accuracy of the deep network model such as the CNN on the freehand sketch.
Disclosure of Invention
The embodiment of the invention aims to provide a hand-drawn sketch recognition method based on an attention mechanism so as to effectively improve the precision of the existing method in hand-drawn sketch recognition.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
the invention provides a hand-drawn sketch recognition method based on an attention mechanism, which comprises the following steps of:
inputting the original freehand sketch into a deep convolutional neural network to obtain a characteristic diagram output by a last convolutional layer;
inputting the feature map into a channel attention module to obtain a feature map optimized based on channel attention;
training a classification network for predicting vertical turning of the hand-drawn sketch, and inputting the original hand-drawn sketch into the trained classification network to obtain a vertical turning space attention diagram;
calculating to obtain a feature map after the attention optimization of the vertical turnover space based on the feature map after the attention optimization of the channel and the attention map of the vertical turnover space;
and inputting the optimized characteristic graph into the global average pooling layer and the full-connection layer to finally obtain the recognition result of the hand-drawn sketch.
Further, inputting the original hand-drawn sketch into a deep convolutional neural network to obtain a feature map output by a last convolutional layer, wherein the feature map comprises:
inputting the original freehand sketch into the residual error network to obtain a characteristic diagram output by the last convolutional layer Conv5
Figure 132941DEST_PATH_IMAGE001
Wherein
Figure 678323DEST_PATH_IMAGE002
Figure 792909DEST_PATH_IMAGE003
Figure 777046DEST_PATH_IMAGE004
Figure 418112DEST_PATH_IMAGE005
Are respectively the characteristic diagram
Figure 259029DEST_PATH_IMAGE001
Width, height and channel dimensions.
Further, inputting the feature map into a channel attention module to obtain a feature map optimized based on channel attention, including:
in the characteristic diagram
Figure 1857DEST_PATH_IMAGE001
Carrying out average pooling and maximum pooling operations to respectively obtain dimensionality of
Figure 586422DEST_PATH_IMAGE006
Feature vector of
Figure 81994DEST_PATH_IMAGE007
And
Figure 297075DEST_PATH_IMAGE008
the feature vector is combined
Figure 120674DEST_PATH_IMAGE007
And
Figure 384296DEST_PATH_IMAGE008
are respectively input into a convolution kernel of size
Figure 875321DEST_PATH_IMAGE009
And reducing the channel dimension to
Figure 182674DEST_PATH_IMAGE010
Wherein
Figure 759149DEST_PATH_IMAGE011
Set to 16, then activate the reduced-dimension eigenvector with the ReLU function, then input to another convolution kernel of size
Figure 560883DEST_PATH_IMAGE009
And restoring the channel dimension to
Figure 844097DEST_PATH_IMAGE005
Respectively obtaining new feature vectors
Figure 463297DEST_PATH_IMAGE012
And
Figure 651702DEST_PATH_IMAGE013
the calculation formula is as follows:
Figure 381760DEST_PATH_IMAGE014
Figure 519480DEST_PATH_IMAGE015
wherein the content of the first and second substances,
Figure 184948DEST_PATH_IMAGE016
and
Figure 470436DEST_PATH_IMAGE017
are respectively two
Figure 863240DEST_PATH_IMAGE009
Parameters of the convolutional layer;
for the new feature vector
Figure 917784DEST_PATH_IMAGE012
And
Figure 754153DEST_PATH_IMAGE013
adding operation is carried out, and activation is carried out by adopting Sigmoid function to obtain a channel attention diagram
Figure 526937DEST_PATH_IMAGE018
The calculation formula is as follows:
Figure 723432DEST_PATH_IMAGE019
wherein the content of the first and second substances,
Figure 898061DEST_PATH_IMAGE020
indicating that the Sigmoid-activated function,
Figure 967648DEST_PATH_IMAGE021
is a summing operation;
based on the feature map
Figure 103094DEST_PATH_IMAGE001
With the channel attention map
Figure 978647DEST_PATH_IMAGE018
The feature map based on the channel attention optimization is calculated by adopting the following formula
Figure 866837DEST_PATH_IMAGE022
Figure 904063DEST_PATH_IMAGE023
Wherein the content of the first and second substances,
Figure 526805DEST_PATH_IMAGE024
representing a multiplication operation.
Further, training a classification network for predicting vertical turning of the hand-drawn sketch, and inputting the original hand-drawn sketch into the trained classification network to obtain a vertical turning spatial attention diagram, comprising:
selecting a most representative TU-Berlin hand-drawn sketch data set as training data, vertically overturning each hand-drawn sketch to obtain an overturned sketch, setting a label of the hand-drawn sketch which is not overturned to be 0, setting a label of the hand-drawn sketch which is vertically overturned to be 1, and constructing a two-class data set containing the original hand-drawn sketch which is not overturned and is vertically overturned; selecting a residual error network as a classifier, and training a classification network for predicting vertical turning of the freehand sketch on the second class of data sets;
inputting the original freehand sketch into the classification network to obtain a predicted class label
Figure 940469DEST_PATH_IMAGE025
And extracting the feature map output by the last convolutional layer
Figure 948745DEST_PATH_IMAGE026
Wherein
Figure 688031DEST_PATH_IMAGE027
Defining the characteristic diagram
Figure 532491DEST_PATH_IMAGE026
The feature map on the single channel is
Figure 484266DEST_PATH_IMAGE028
Figure 691256DEST_PATH_IMAGE029
The vertical flip space attention map is calculated as follows
Figure 194919DEST_PATH_IMAGE030
Figure 651308DEST_PATH_IMAGE031
Wherein the content of the first and second substances,
Figure 282141DEST_PATH_IMAGE032
it is shown that the multiplication operation is performed,
Figure 140375DEST_PATH_IMAGE033
Figure 80518DEST_PATH_IMAGE034
to the full connection layer corresponding to
Figure 24204DEST_PATH_IMAGE035
In a channel
Figure 193148DEST_PATH_IMAGE025
The weight of each category.
Further, the feature map after the attention optimization of the vertical flip space is obtained through calculation based on the feature map after the attention optimization of the channel and the attention map of the vertical flip space in a combined mode, and the method comprises the following steps:
obtaining the feature map after the optimization based on the channel attention
Figure 905889DEST_PATH_IMAGE022
With the vertically flipped space attention map
Figure 16933DEST_PATH_IMAGE030
Then, a feature map after vertical turnover space attention optimization is calculated and obtained in the following mode
Figure 447915DEST_PATH_IMAGE036
Figure 420550DEST_PATH_IMAGE037
Further, the feature map is used
Figure 987797DEST_PATH_IMAGE036
Inputting subsequent global average pooling layers and full-connection layers to finally obtain the original freehand sketchAnd identifying a result.
Compared with the prior art, the design scheme of the invention can achieve the following beneficial effects:
1. the method for identifying the hand-drawn sketch based on the attention mechanism can enable the deep convolutional neural network to focus on a part with more discrimination in feature representation, so that the accuracy of identifying the hand-drawn sketch is effectively improved.
2. The vertical turnover space attention provided by the invention adopts a self-supervision learning mode, can automatically evaluate the importance of different space positions in the feature diagram, and is mutually assisted with the channel attention module to learn more effective feature representation.
3. The channel attention module and the vertical turning space attention module can be embedded into most of the current deep convolutional neural networks and can be used as a standard component for improving the characteristic expression capability of the hand-drawn sketch, so that the channel attention module and the vertical turning space attention module can be applied to other related fields such as hand-drawn sketch segmentation, generation, retrieval based on the hand-drawn sketch and the like.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
fig. 1 is a network structure diagram of a hand-drawn sketch identification method based on an attention mechanism according to an embodiment of the present invention;
FIG. 2 is a block diagram of a channel attention module in an embodiment of the present invention;
FIG. 3 is a block diagram of a vertically flipped spatial attention module in an embodiment of the invention.
FIG. 4 is an example of recognition results for different categories of hand-drawn sketches.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a network structure diagram of a hand-drawn sketch identification method based on an attention mechanism according to an embodiment of the present invention; the hand-drawn sketch recognition method based on the attention mechanism provided by the embodiment comprises the following steps of:
step S101, inputting an original hand-drawn sketch into a deep convolutional neural network to obtain a characteristic diagram output by a last convolutional layer; specifically, the method comprises the following steps:
taking the most representative residual error network in the deep convolutional neural network as an example, inputting the original freehand sketch into the residual error network to obtain the feature map output by the last convolutional layer Conv5
Figure 535322DEST_PATH_IMAGE001
Wherein
Figure 453600DEST_PATH_IMAGE002
Figure 964347DEST_PATH_IMAGE003
Figure 651680DEST_PATH_IMAGE004
Figure 183155DEST_PATH_IMAGE005
Are respectively the characteristic diagram
Figure 447784DEST_PATH_IMAGE001
Width, height and channel dimensions.
Step S103, inputting the feature map into a channel attention module to obtain a feature map optimized based on channel attention; as shown in fig. 2, specifically, the following sub-steps are included:
step S1031 of forming a feature map
Figure 152434DEST_PATH_IMAGE001
Carrying out average pooling and maximum pooling operations to respectively obtain dimensionality of
Figure 569640DEST_PATH_IMAGE006
Feature vector of
Figure 68755DEST_PATH_IMAGE007
And
Figure 617417DEST_PATH_IMAGE008
(ii) a The invention adopts two different pooling operations of average pooling and maximum pooling, aiming at extracting richer high-level features, thereby increasing the expression capability of the features;
step S1032, the feature vector is processed
Figure 469966DEST_PATH_IMAGE007
And
Figure 335154DEST_PATH_IMAGE008
are respectively input into a convolution kernel of size
Figure 5170DEST_PATH_IMAGE009
Of
Figure 244390DEST_PATH_IMAGE009
Conv and reduce the channel dimension to
Figure 25264DEST_PATH_IMAGE010
Wherein
Figure 417062DEST_PATH_IMAGE011
Setting as 16, and then activating the feature vector after dimensionality reduction by adopting a ReLU function, so that more nonlinearity can be achieved, and complex correlation among channels can be better fitted; then input to another convolutional layer
Figure 257980DEST_PATH_IMAGE009
Conv and restore channel dimension to
Figure 250075DEST_PATH_IMAGE005
Respectively obtaining new feature vectors
Figure 834640DEST_PATH_IMAGE012
And
Figure 80945DEST_PATH_IMAGE013
the calculation formula is as follows:
Figure 296026DEST_PATH_IMAGE014
Figure 119625DEST_PATH_IMAGE015
wherein the content of the first and second substances,
Figure 898094DEST_PATH_IMAGE016
and
Figure 123539DEST_PATH_IMAGE017
respectively two of the convolution layers
Figure 181625DEST_PATH_IMAGE009
Parameters of Conv; the method of firstly reducing the dimension and then restoring the dimension can effectively reduce the number of parameters of network learning, thereby achieving the effect of reducing the complexity of the model;
step S1033, for the new feature vector
Figure 961362DEST_PATH_IMAGE012
And
Figure 887730DEST_PATH_IMAGE013
adding operation is carried out, and activation is carried out by adopting Sigmoid function, thereby obtaining a channel attention diagram fused with double attention
Figure 115753DEST_PATH_IMAGE018
The calculation formula is as follows:
Figure 532DEST_PATH_IMAGE019
wherein the content of the first and second substances,
Figure 674090DEST_PATH_IMAGE020
indicating that the Sigmoid-activated function,
Figure 138569DEST_PATH_IMAGE021
Figure 541869DEST_PATH_IMAGE038
is a summing operation; according to the method, the importance degree of each characteristic channel is automatically acquired in a learning mode, so that useful characteristics are improved, characteristics with low use on the current task are restrained, and the identification performance of the model can be effectively improved.
Step S1034, based on the feature map
Figure 456604DEST_PATH_IMAGE001
With the channel attention map
Figure 476513DEST_PATH_IMAGE018
The feature map based on the channel attention optimization is calculated by adopting the following formula
Figure 885628DEST_PATH_IMAGE022
Figure 940172DEST_PATH_IMAGE039
Wherein the content of the first and second substances,
Figure 760229DEST_PATH_IMAGE024
represents a multiplication operation; jumps used in the inventionThe connection mode is kept
Figure 798593DEST_PATH_IMAGE001
The information of (2) and the attention information brought by the attention of the channel are added, so that the network learning can be helped to obtain more effective hand-drawn sketch characteristics.
Step S105, training a classification network for predicting vertical turning of the freehand sketch, and inputting the original freehand sketch into the trained classification network to obtain a vertical turning space attention diagram; as shown in fig. 3, specifically, the following sub-steps are included:
step S1051, taking a TU-Berlin hand-drawn sketch data set which is commonly used at present as an example, vertically turning each hand-drawn sketch therein, setting a label of the hand-drawn sketch which is not turned over as 0, setting a label of the hand-drawn sketch which is vertically turned over as 1, and constructing a two-class data set containing the original hand-drawn sketch which is not turned over and is vertically turned over; selecting a residual error network as a classifier, and training a classification network for predicting vertical turning of the freehand sketch on the second class of data sets;
step S1052, inputting the original hand-drawn sketch into the classification network to obtain a predicted class label
Figure 745820DEST_PATH_IMAGE025
And extracting the feature map output by the Conv5 of the last convolutional layer
Figure 654870DEST_PATH_IMAGE026
Wherein
Figure 990037DEST_PATH_IMAGE027
Defining the characteristic diagram
Figure 109171DEST_PATH_IMAGE026
The feature map on the single channel is
Figure 250303DEST_PATH_IMAGE028
Figure 154805DEST_PATH_IMAGE029
The vertical flip space attention map is calculated as follows
Figure 395293DEST_PATH_IMAGE030
Figure 142669DEST_PATH_IMAGE040
Wherein the content of the first and second substances,
Figure 680967DEST_PATH_IMAGE032
it is shown that the multiplication operation is performed,
Figure 830188DEST_PATH_IMAGE033
Figure 913682DEST_PATH_IMAGE034
in the fully-connected layer after the global average pooling layer GAP, corresponds to the second
Figure 882775DEST_PATH_IMAGE035
A channel is
Figure 303392DEST_PATH_IMAGE025
The weight of each category is used for measuring the characteristic diagram of each channel relative to the second category
Figure 166175DEST_PATH_IMAGE025
The importance of the individual categories; the global average pooling layer GAP adopted by the invention can reduce the number of network parameters so as to reduce the occurrence of overfitting, and the extracted features have global receptive fields so as to enhance the expression capability of the features.
The invention trains a network for predicting vertical turning of the hand-drawn sketch in an automatic supervision learning mode, and automatically calculates the response strength at different spatial positions in the characteristic diagram so as to reflect the importance of the different spatial positions.
Step S107, calculating to obtain a feature map after the attention optimization of the vertical turnover space based on the feature map after the attention optimization of the channel and the attention map of the vertical turnover space; specifically, the method comprises the following steps:
obtaining the feature map after the optimization based on the channel attention
Figure 810783DEST_PATH_IMAGE022
With the vertically flipped space attention map
Figure 142538DEST_PATH_IMAGE030
Then, a feature map after vertical turnover space attention optimization is calculated and obtained in the following mode
Figure 163584DEST_PATH_IMAGE036
Figure 146452DEST_PATH_IMAGE041
Step S109, the characteristic diagram is processed
Figure 961961DEST_PATH_IMAGE036
Inputting the subsequent global average pooling layer GAP and the full connection layer FC to obtain the final hand-drawn sketch recognition result.
As shown in fig. 1, a hand-drawn sketch of the category "airplane" is input, and the recognition result "airplane" is output through the hand-drawn sketch recognition method based on the attention mechanism. Fig. 4 shows an example of the recognition result for more category implementation cases, and it can be seen that the invention can accurately recognize the input hand-drawn sketch.
On a common TU-Berlin hand-drawn sketch data set, compared with a ResNet-50 model as a base line, the method can achieve 2.4% of identification accuracy improvement, and can also achieve about 1% of performance improvement compared with models with more layers, such as ResNet-101, ResNet-152, ResNeXt-101 and the like, so that the effectiveness of the method in hand-drawn sketch identification is proved. The specific results are shown in the following table:
ResNet-50 ResNet-101 ResNet-152 ResNeXt-101
base line 77.3% 79.5% 80.3% 80.5%
The invention 79.7% 80.6% 81.2% 81.5%
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (5)

1. A hand-drawn sketch recognition method based on an attention mechanism is characterized by comprising the following steps:
inputting the original freehand sketch into a deep convolutional neural network to obtain an original characteristic diagram output by the last convolutional layer;
inputting the original feature map into a channel attention module to obtain a feature map optimized based on channel attention;
training a classification network for predicting vertical turning of the hand-drawn sketch, and inputting the original hand-drawn sketch into the classification network to obtain a vertical turning space attention diagram;
combining the feature map after the optimization based on the channel attention and the vertical turnover space attention map, and calculating to obtain a feature map after the optimization of the vertical turnover space attention;
and inputting the feature graph after the vertical turning space attention optimization into a global average pooling layer and a full-connection layer, and finally obtaining the recognition result on the original hand-drawn sketch.
2. The attention mechanism-based hand-drawn sketch recognition method as claimed in claim 1, wherein the step of inputting the original hand-drawn sketch into a deep convolutional neural network to obtain an original feature map output by a last convolutional layer comprises:
inputting the original hand-drawn sketch into the residual error network by adopting the residual error network as a main network for feature extraction, and extracting an original feature map output by the Conv5 of the last convolutional layer
Figure 502026DEST_PATH_IMAGE001
Wherein
Figure 825691DEST_PATH_IMAGE002
Figure 136586DEST_PATH_IMAGE003
Figure 187588DEST_PATH_IMAGE004
Figure 939643DEST_PATH_IMAGE005
Respectively are the original characteristic diagram
Figure 27685DEST_PATH_IMAGE001
Width, height and channel dimensions.
3. The method for identifying a hand-drawn sketch map based on an attention mechanism as claimed in claim 2, wherein the step of inputting the original feature map into a channel attention module to obtain a feature map optimized based on channel attention comprises the steps of:
a) in the original feature map
Figure 216090DEST_PATH_IMAGE001
Carrying out average pooling and maximum pooling operations to respectively obtain dimensionality of
Figure 946148DEST_PATH_IMAGE006
Feature vector of
Figure 21552DEST_PATH_IMAGE007
And
Figure 546074DEST_PATH_IMAGE008
b) the feature vector is combined
Figure 565983DEST_PATH_IMAGE007
And
Figure 958787DEST_PATH_IMAGE008
are respectively input into a convolution kernel of size
Figure 13330DEST_PATH_IMAGE009
And reducing the channel dimension to
Figure 849699DEST_PATH_IMAGE010
Wherein
Figure 622483DEST_PATH_IMAGE011
Set to 16, then activate the reduced-dimension eigenvector with the ReLU function, then input to another convolution kernel of size
Figure 553399DEST_PATH_IMAGE009
And restoring the channel dimension to
Figure 462449DEST_PATH_IMAGE005
Respectively obtaining new feature vectors
Figure 204140DEST_PATH_IMAGE012
And
Figure 198641DEST_PATH_IMAGE013
the calculation formula is as follows:
Figure 808614DEST_PATH_IMAGE014
Figure 962384DEST_PATH_IMAGE015
wherein the content of the first and second substances,
Figure 999610DEST_PATH_IMAGE016
and
Figure 356773DEST_PATH_IMAGE017
respectively two of said convolution kernels having a size of
Figure 504858DEST_PATH_IMAGE009
Parameters of the convolutional layer of (a);
c) for the new feature vector
Figure 388500DEST_PATH_IMAGE012
And
Figure 986840DEST_PATH_IMAGE013
performing summation operation, and activating by adopting Sigmoid function to obtain channel attention diagram
Figure 690354DEST_PATH_IMAGE018
The calculation formula is as follows:
Figure 517496DEST_PATH_IMAGE019
wherein the content of the first and second substances,
Figure 521224DEST_PATH_IMAGE020
indicating that the Sigmoid-activated function,
Figure 900253DEST_PATH_IMAGE021
is a summing operation;
d) based on the feature map
Figure 481276DEST_PATH_IMAGE001
With the channel attention map
Figure 971163DEST_PATH_IMAGE018
The feature map based on the channel attention optimization is calculated by adopting the following formula
Figure 439184DEST_PATH_IMAGE022
Figure 254694DEST_PATH_IMAGE023
Wherein the content of the first and second substances,
Figure 932800DEST_PATH_IMAGE024
representing a matrix multiplication operation.
4. The method for hand-drawn sketch recognition based on attention mechanism as claimed in claim 3, wherein training a classification network for predicting vertical flipping of the hand-drawn sketch, inputting the original hand-drawn sketch into the classification network, and obtaining a vertical flipping spatial attention map comprises:
a) vertically overturning each hand-drawn sketch in the TU-Berlin hand-drawn sketch data set to obtain an overturned sketch; setting the label of the non-overturned hand-drawn sketch as 0 and the label of the vertically overturned hand-drawn sketch as 1, thereby constructing a second class data set containing the non-overturned hand-drawn sketch and the vertically overturned hand-drawn sketch; selecting a residual error network as a classifier, and training a classification network for predicting vertical turning of the freehand sketch on the second class of data sets;
b) inputting the original freehand sketch into the classification network, outputting predicted class labels
Figure 616591DEST_PATH_IMAGE025
And extracting the feature map output by the last convolutional layer
Figure 63753DEST_PATH_IMAGE026
Wherein
Figure 659950DEST_PATH_IMAGE027
Defining the characteristic diagram
Figure 825352DEST_PATH_IMAGE026
The feature map on the single channel is
Figure 922621DEST_PATH_IMAGE028
Figure 880082DEST_PATH_IMAGE029
The vertical flip space attention map is calculated as follows
Figure 506235DEST_PATH_IMAGE030
Figure 299879DEST_PATH_IMAGE031
Wherein the content of the first and second substances,
Figure 200839DEST_PATH_IMAGE032
it is shown that the multiplication operation is performed,
Figure 357014DEST_PATH_IMAGE033
Figure 809861DEST_PATH_IMAGE034
to the full connection layer corresponding to
Figure 949855DEST_PATH_IMAGE035
In a channel
Figure 264293DEST_PATH_IMAGE025
The weight of each category.
5. The method for identifying a hand-drawn sketch map based on an attention mechanism as claimed in claim 4, wherein the step of calculating the feature map after the attention optimization of the vertical flip space by combining the feature map after the attention optimization of the channel and the attention map of the vertical flip space comprises:
obtaining the feature map after the optimization based on the channel attention
Figure 806132DEST_PATH_IMAGE022
And the vertical turnover spaceAttention is sought
Figure 429881DEST_PATH_IMAGE030
Then, a feature map after vertical turnover space attention optimization is calculated and obtained in the following mode
Figure 791592DEST_PATH_IMAGE036
Figure 34354DEST_PATH_IMAGE037
CN202110210499.9A 2021-02-25 2021-02-25 Hand-drawn sketch identification method based on attention mechanism Active CN112580614B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110210499.9A CN112580614B (en) 2021-02-25 2021-02-25 Hand-drawn sketch identification method based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110210499.9A CN112580614B (en) 2021-02-25 2021-02-25 Hand-drawn sketch identification method based on attention mechanism

Publications (2)

Publication Number Publication Date
CN112580614A true CN112580614A (en) 2021-03-30
CN112580614B CN112580614B (en) 2021-06-08

Family

ID=75113986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110210499.9A Active CN112580614B (en) 2021-02-25 2021-02-25 Hand-drawn sketch identification method based on attention mechanism

Country Status (1)

Country Link
CN (1) CN112580614B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610164A (en) * 2021-08-10 2021-11-05 北京邮电大学 Fine-grained image recognition method and system based on attention balance

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171268A (en) * 2018-01-02 2018-06-15 联想(北京)有限公司 A kind of image processing method and electronic equipment
CN111339862A (en) * 2020-02-17 2020-06-26 中国地质大学(武汉) Remote sensing scene classification method and device based on channel attention mechanism
CN111488474A (en) * 2020-03-21 2020-08-04 复旦大学 Fine-grained freehand sketch image retrieval method based on attention enhancement
CN111985370A (en) * 2020-08-10 2020-11-24 华南农业大学 Crop pest and disease fine-grained identification method based on improved mixed attention module

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171268A (en) * 2018-01-02 2018-06-15 联想(北京)有限公司 A kind of image processing method and electronic equipment
CN111339862A (en) * 2020-02-17 2020-06-26 中国地质大学(武汉) Remote sensing scene classification method and device based on channel attention mechanism
CN111488474A (en) * 2020-03-21 2020-08-04 复旦大学 Fine-grained freehand sketch image retrieval method based on attention enhancement
CN111985370A (en) * 2020-08-10 2020-11-24 华南农业大学 Crop pest and disease fine-grained identification method based on improved mixed attention module

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610164A (en) * 2021-08-10 2021-11-05 北京邮电大学 Fine-grained image recognition method and system based on attention balance
CN113610164B (en) * 2021-08-10 2023-12-22 北京邮电大学 Fine granularity image recognition method and system based on attention balance

Also Published As

Publication number Publication date
CN112580614B (en) 2021-06-08

Similar Documents

Publication Publication Date Title
CN111489358B (en) Three-dimensional point cloud semantic segmentation method based on deep learning
CN110532900B (en) Facial expression recognition method based on U-Net and LS-CNN
Quattoni et al. Hidden-state conditional random fields
CN109063719B (en) Image classification method combining structure similarity and class information
CN108734210B (en) Object detection method based on cross-modal multi-scale feature fusion
CN110046671A (en) A kind of file classification method based on capsule network
CN104036255A (en) Facial expression recognition method
CN108830237B (en) Facial expression recognition method
CN111612008A (en) Image segmentation method based on convolution network
CN109753897B (en) Behavior recognition method based on memory cell reinforcement-time sequence dynamic learning
CN112464865A (en) Facial expression recognition method based on pixel and geometric mixed features
CN109033978B (en) Error correction strategy-based CNN-SVM hybrid model gesture recognition method
CN110751038A (en) PDF table structure identification method based on graph attention machine mechanism
CN113868448A (en) Fine-grained scene level sketch-based image retrieval method and system
CN105549885A (en) Method and device for recognizing user emotion during screen sliding operation
CN110942110A (en) Feature extraction method and device of three-dimensional model
CN112784921A (en) Task attention guided small sample image complementary learning classification algorithm
CN110334584A (en) A kind of gesture identification method based on the full convolutional network in region
CN110111365B (en) Training method and device based on deep learning and target tracking method and device
Zhao et al. Cbph-net: A small object detector for behavior recognition in classroom scenarios
CN112580614B (en) Hand-drawn sketch identification method based on attention mechanism
CN113032613B (en) Three-dimensional model retrieval method based on interactive attention convolution neural network
Yao Application of higher education management in colleges and universities by deep learning
CN111144469B (en) End-to-end multi-sequence text recognition method based on multi-dimensional associated time sequence classification neural network
Ouadiay et al. Simultaneous object detection and localization using convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant