CN111325237A - Image identification method based on attention interaction mechanism - Google Patents

Image identification method based on attention interaction mechanism Download PDF

Info

Publication number
CN111325237A
CN111325237A CN202010070791.0A CN202010070791A CN111325237A CN 111325237 A CN111325237 A CN 111325237A CN 202010070791 A CN202010070791 A CN 202010070791A CN 111325237 A CN111325237 A CN 111325237A
Authority
CN
China
Prior art keywords
feature
image
features
gate
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010070791.0A
Other languages
Chinese (zh)
Other versions
CN111325237B (en
Inventor
乔宇
庄培钦
王亚立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202010070791.0A priority Critical patent/CN111325237B/en
Publication of CN111325237A publication Critical patent/CN111325237A/en
Application granted granted Critical
Publication of CN111325237B publication Critical patent/CN111325237B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention provides an image recognition method based on an attention interaction mechanism, which utilizes a pre-trained image recognition model to obtain the classification of a picture to be tested, wherein the training process of the image recognition model comprises the following steps: for each of the N image categories, selecting K images and inputting the K images into a convolutional neural network for feature extraction to obtain a plurality of image features; establishing an image feature pair according to the similarity between different image features; extracting common feature vectors from the constructed image feature pairs through common feature learning; calculating a gate feature vector corresponding to each feature in the image feature pair based on the common feature vector; and inputting the features after the combination of each feature in the image feature pair and the gate feature vector into a classifier, and optimizing according to the set loss function to obtain the trained convolutional neural network and the classifier. The invention can improve the accuracy of image recognition, and is particularly suitable for fine-grained image recognition.

Description

Image identification method based on attention interaction mechanism
Technical Field
The invention relates to the technical field of computer vision, in particular to an image identification method based on an attention interaction mechanism.
Background
In recent years, methods based on deep learning have made a great breakthrough in the field of computer vision, particularly as typified by image recognition tasks. But in the image recognition task, the fine-grained image (subcategory) recognition task has limited breakthroughs. Compared with the conventional general object recognition task, the difficulty of fine-grained image recognition is mainly reflected in that: 1) the classification of the data set is extremely fine, the similarity of images in adjacent subcategories is high, only slight visual difference exists, and the visual difference is not easy to be found and distinguished; 2) due to the influence of various factors such as light, visual angle, posture and the like in the image acquisition process, the images in the same category have great difference. The fine-grained images have the characteristics of small inter-class difference and large intra-class difference, so that the identification task is challenged. The identification requirement of the fine-grained image is commonly found in biological species identification tasks with classification levels in nature.
In the prior art, the task of fine-grained image recognition generally originates from the following three main ideas: 1) a key component positioning method. Since the images of similar categories in the fine-grained image task have small differences and are not easily distinguished, it is necessary to select features with high discriminative power in the images for final classification. The method hopes to be capable of automatically positioning a plurality of key parts in the image and extracting the image characteristics of the local areas. But because the experiment often only has weak supervision information (image label information), the method has limited capability of positioning key parts; 2) and (4) learning high-order features. Because the image content in the fine-grained task is complex and various and the expression capacity of the conventional feature extraction method is limited, the method hopes to improve the expression capacity of the features so as to improve the capacity of the algorithm; 3) a method of metric-based learning. Since fine-grained images have the characteristics of small inter-class differences and large intra-class differences, a metric learning-based method is expected to improve such a situation. However, this method can only improve the distribution of samples in the feature space, and lacks the capability of finding differences between samples, so that the performance of the recognition task cannot be improved well.
Because the difference between similar images in the fine-grained image recognition task is slight, the existing method takes corresponding measures aiming at the situation that the content in the fine-grained image is complex. For example, by constructing high-order image features, the expression capability of the features is increased, and the quality of the features is improved, so that the performance of an identification task is improved; for another example, by using detection and segmentation techniques, important local regions are found in the original image, and by extracting image features of these key regions. However, the existing methods are all built in a single image, so that the difference part between two similar images cannot be found, and the image area with high distinguishability cannot be really and efficiently found.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an image identification method based on an attention interaction mechanism, wherein the method can be used for comparing two images with high similarity by simulating the cognitive process of a human, so that the difference between the image pairs can be found, and the images can be accurately distinguished.
According to a first aspect of the present invention, a method of constructing an image recognition model based on an attention interaction mechanism is provided. The method comprises the following steps:
for each of N image categories, selecting K images, inputting the K images into a convolutional neural network for feature extraction, and obtaining a plurality of image features, wherein N, K is an integer greater than or equal to 2;
establishing an image feature pair according to the similarity between different image features;
extracting common feature vectors from the constructed image feature pairs through common feature learning;
calculating a gate feature vector corresponding to each feature in the image feature pair based on the common feature vector;
and inputting the features of the image feature pair combined with the gate feature vector into a classifier, and optimizing according to a set loss function to obtain a trained convolutional neural network and a trained classifier.
In one embodiment, the creating image feature pairs according to the similarity between different image features comprises: for each image feature x1Calculating the nearest image feature in or among the classes according to the Euclidean distance, and marking as x2And forming 2 × N × K image feature pairs.
In one embodiment, the extracting common feature vectors comprises:
image feature pair x1And x2Splicing is carried out, the spliced features are respectively sent into a plurality of full connection layers, and a common feature vector is obtained and expressed as:
xm=fm([x1,x2])。
in one embodiment, calculating the gate feature vector for each feature in the image feature pair comprises:
common feature vector xmRespectively multiplying the feature points of the image feature pairs, and normalizing through a sigmoid function to obtain corresponding gate feature vectors, wherein the gate feature vectors are expressed as:
gi=sigmoid(xm⊙xi),i∈{1,2}。
in one embodiment, the feature of each feature in the image feature pair combined with the gate feature vector includes four expression forms, respectively
Figure BDA0002377235770000031
Figure BDA0002377235770000032
Figure BDA0002377235770000033
Figure BDA0002377235770000034
Figure BDA0002377235770000035
Wherein
Figure BDA0002377235770000036
Representing the result of the multiplication of the own image feature with the corresponding gate feature vector point,
Figure BDA0002377235770000037
representing the result of the point multiplication of the image features of itself with other gate feature vectors, g1,g2Representing the gate feature vector.
In one embodiment, the loss function is set to:
Figure BDA0002377235770000038
wherein, yiReflects the true classification label or labels,
Figure BDA0002377235770000039
a classification probability vector representing the classifier output.
In one embodiment, the loss function is set to:
Figure BDA00023772357700000310
wherein
Figure BDA00023772357700000311
Representing probability vectors
Figure BDA00023772357700000312
C iniThe score corresponding to the class, ∈, represents the threshold.
According to a second aspect of the present invention, an image recognition method based on an attention interaction mechanism is provided. The method comprises the following steps:
sending a single picture into the trained convolutional neural network of the invention, and extracting the corresponding image characteristic x*X is to be*And sending the data to the trained classifier to obtain a final classification result.
Compared with the prior art, the invention has the advantages that: the method can solve the problem that only a single picture is considered in modeling of the prior art, and the difference between the image pairs is neglected to be found.
Drawings
The invention is illustrated and described only by way of example and not by way of limitation in the scope of the invention as set forth in the following drawings, in which:
FIG. 1 is a flow diagram of an image recognition method based on an attention interaction mechanism, according to one embodiment of the present invention;
FIG. 2 is a schematic diagram of a common feature vector learning module according to one embodiment of the invention;
FIG. 3 is a schematic diagram of an attention interaction mechanism, according to one embodiment of the invention;
FIG. 4 is a schematic diagram of an image recognition system based on an attention interaction mechanism, according to one embodiment of the present invention;
FIG. 5 is a schematic diagram of a terminal device according to one embodiment of the invention;
FIG. 6 is a schematic diagram of an application embodiment according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions, design methods, and advantages of the present invention more apparent, the present invention will be further described in detail by specific embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not as a limitation. Thus, other examples of the exemplary embodiments may have different values.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
The image recognition method based on the attention interaction mechanism is based on the image pair, and the characteristic difference in the image pair is found through comparison, so that the two images are correctly distinguished. In brief, the method simultaneously inputs a pair of similar images, and firstly constructs a common (mutual) feature vector which contains the difference semantic features in the image pair; then multiplying and normalizing each image feature and the common feature point to generate a gate feature vector for searching a channel with high-specificity semantic features; and finally, the original image features are interacted with the door feature vectors to improve the sensitivity of the classifier for finding subtle differences in the features.
Specifically, referring to fig. 1, an image recognition method provided by the embodiment of the present invention includes the following steps:
in step S110, for each of the plurality of types of pictures, a plurality of pictures are randomly selected.
Firstly, randomly selecting N categories in a database, and randomly selecting K pictures in the categories for each category, namely selecting N × K pictures in each batch for inputting, wherein the mode of considering the image categories in each batch and selecting the input pictures according to a set strategy is beneficial to ensuring the diversity of data in the same batch compared with the conventional random selection mode.
And step S120, inputting the selected picture into a convolutional neural network for feature extraction.
Specifically, the selected pictures are input into the convolutional neural network, and the image features x and x ∈ R can be obtained through the final Global Pooling operation (GAP) of the networkDWhere D is the dimension of the feature. For example, a ResNet50 network or other type of convolutional neural network is selected for feature extraction based on the complexity of the data and the nature of the task.
In step S130, for each of the extracted image features, an image feature pair is selected based on the degree of similarity between the image features.
In one embodiment, image features with greater degrees of similarity are selected to form pairs of image features. For example, the euclidean distance between different image features is first calculated. For eachIndividual image feature x1Calculating the nearest image feature in or among the classes according to the Euclidean distance, and marking as x2In other embodiments, the distance metric may be replaced by other types such as cosine distance, and the intra-class and inter-class distance metrics may be replaced by farthest, etc.
In step S130, by selecting the image feature pairs, the most similar image pairs can be selected, so that the difficulty of image recognition can be improved, and the robustness of the network can be increased.
In step S140, a common feature vector is extracted from the pair of image features.
Learning the image features through a common feature vector to obtain a corresponding common feature vector xm,xm∈RD. Let the common feature vector learning process be fmThen the common feature vector can be expressed as:
xm=fm([x1,x2]) (1)
the operation represented by equation (1) is to couple features to x1And x2And performing splicing, and sending the spliced features into a multi-layer fully-connected layer, for example, as shown in fig. 2, taking two fully-connected layers as an example, the dimension of the feature mapping is changed from 2048 to 512, and the dimension of the feature mapping is changed from 512 to 2048. In further embodiments, fmOther forms of bilinear pooling operations, dot multiplication, dot addition, etc. may be substituted.
It should be noted that, the number of fully connected layers and the dimension of the feature mapping are not limited in the present invention, and those skilled in the art can set the number and the dimension according to the requirements of the training precision, the training speed, and the like.
In step S150, a gate feature vector corresponding to each feature of the image feature is calculated based on the common feature vector.
The generated common feature vectors and the vectors in the image feature pairs are respectively subjected to point multiplication and normalized by a linear function. For example, the nonlinear function may employ a sigmoid function. Finally generating a gate feature vector gi,gi∈RDExpressed as:
gi=sigmoid(xm⊙xi),i∈{1,2} (2)
giis between 0 and 1, a numerical value indicates that the semantic feature in the channel is a feature xiThe classification of (2) has an important role and high distinctiveness.
In other embodiments, the normalization may be performed by using a tanh function or other non-linear function, which is not limited in the present invention.
The common feature vector is different from the conventional operation, and the features of the vector comprise the features with stronger contrast in the image pair, so that the vector is favorable to be subsequently used as context information to guide the discovery of specific semantic features in the image.
Step S160, combining each feature in the image feature pair with the gate feature vector to obtain a combination result of the self image feature and the corresponding gate feature vector and a combination result of the self image feature and other gate feature vectors.
Combining the original image features (i.e., each feature in the image feature pair) with the gate feature vector to obtain features in four expression forms, as shown in fig. 3, specifically expressed as follows:
Figure BDA0002377235770000061
wherein
Figure BDA0002377235770000062
Represents the result of the point multiplication of the image feature of itself and the corresponding gate feature vector, and
Figure BDA0002377235770000063
representing the result of point multiplication of the image feature of itself with other gate feature vectors, wherein
Figure BDA0002377235770000064
Should be compared with
Figure BDA0002377235770000065
Is more distinctive。
In the step, through an attention interaction mechanism, the diversity of the features can be enriched, and the difficulty of the image features is increased.
And step S170, inputting the combination result into a classifier for optimization, and obtaining the trained convolutional neural network and the classifier.
The combined features are sent to a classifier in sequence to obtain corresponding classification probability vectors
Figure BDA0002377235770000066
Figure BDA0002377235770000067
Where C is the number of classes, expressed as:
Figure BDA0002377235770000068
wherein the content of the first and second substances,
Figure BDA0002377235770000069
is the probability vector normalized by the softmax function, W and b represent the weight and bias of the classifier, respectively. On the basis of the probability vectors, the optimization process of the whole network (namely, the convolutional neural network and the classifier used for feature extraction) is guided by the loss function by introducing the corresponding loss function.
In one embodiment, the optimization process first uses a Cross Entropy Loss function (Cross entry Loss), expressed as:
Figure BDA00023772357700000610
wherein, yiRepresenting true class labels, e.g. yiThe expression of the one-hot coded vector is adopted, and the dimension of the real label is only 1, and the other dimensions are 0.
Further, considering that the different priorities of different feature vectors and the corresponding classification results are different, a score ranking loss function (score ranking loss) can be introduced, which is specifically expressed as:
Figure BDA0002377235770000071
wherein
Figure BDA0002377235770000072
Representing probability vectors
Figure BDA0002377235770000073
C iniThe score corresponding to the class, ∈ represents the threshold value the score ordering penalty function expects the probability vector
Figure BDA0002377235770000074
In the c thiThe score on a class can be compared to a probability vector
Figure BDA0002377235770000075
In the c thiThe score on the class is large and at least exceeds the threshold ∈, the threshold ∈ can be set according to the classification accuracy and other factors, and the invention is not limited to this.
By adding the fractional ranking loss function, the influence of the subtle feature difference on the classification result is taken into consideration, so that the sensitivity of the classifier on the subtle image difference can be increased, and the classification robustness is increased.
The optimized convolutional neural network parameters and classifier parameters, i.e. the trained image recognition model, can be obtained through the training process. In practical application, for the pictures to be classified, a single picture can be sent into a trained convolutional neural network, and corresponding image features x are extracted*X is to be*And sending the data to a trained classifier to obtain a final classification result.
For example, as shown in FIG. 4, the system comprises a data input module for selecting pictures according to a preset data selection strategy, selecting several classes (N) in each batch, selecting several pictures (K) from each class, an image pair selection module for calculating Euclidean distances between every two image features after obtaining the N × K image features, selecting feature pairs formed by features in the class and among the classes with the smallest Euclidean distance from the image features of the image features, and obtaining 2 × N × K image feature pairs, a common feature vector learning module for obtaining common features of the image pairs by mapping of a full connection layer for each pair of feature pairs, a gate feature vector generation module for performing single point multiplication and normalization on the common features and the features in the feature pairs to respectively obtain two gate feature vectors, each gate feature vector can represent a channel with high-performance semantic features in the image, an attention interaction module for obtaining two gate feature vectors corresponding to each pair of feature pairs, and a final feature classifier for obtaining four types of residual image features by adopting a residual image classifier and a final feature classifier for realizing classification.
The invention can be used for various image recognition scenes, such as the image recognition scene of a mobile terminal. Referring to fig. 5, the mobile terminal includes a data acquisition module, an algorithm processing module, and a user interface display module, and the specific process includes: acquiring a picture to be predicted through a mobile phone terminal, and performing simple image preprocessing; and then, the image is sent to an algorithm recognition module, the feature extraction is carried out through a pre-trained convolutional neural network model, and then the extracted features are sent to a classifier recognition module to obtain a prediction result. Furthermore, the identification result can be returned to the mobile phone terminal, and the acquired image and the identification result of the image are displayed on a display interface.
The invention aims to discover semantic features with high distinguishability in fine-grained images by simultaneously inputting similar image pairs in the training process, and finally improve the performance of the recognition task. The invention is particularly suitable for the recognition of fine-grained images in real life or used for tasks such as object recognition task, face recognition, pedestrian re-recognition, biological category recognition and the like. For example, fine-grained images include birds, flowers, cars, biological categories with classification levels, and the like. Referring to fig. 6, the specific process includes: collecting corresponding data sets and dividing a training set; selecting reasonable hyper-parameters and strategies including but not limited to a basic network, batch size, learning rate, a common vector generation module and the like, and optimizing the network by using the strategy provided by the scheme of the invention under the given condition of the hyper-parameters; and sending the given picture to be tested into a network to obtain a prediction label corresponding to the test picture and give a name of the corresponding picture category.
Through verification, the image identification method based on the attention interaction mechanism can effectively improve the identification accuracy, compared with other existing methods, the image identification accuracy can be improved by 1 to 2 percent on a plurality of databases, and the effect on fine-grained images is particularly obvious.
It should be noted that, although the steps are described in a specific order, the steps are not necessarily performed in the specific order, and in fact, some of the steps may be performed concurrently or even in a changed order as long as the required functions are achieved.
The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A method of constructing an image recognition model based on an attention interaction mechanism, comprising the steps of:
for each of N image categories, selecting K images, inputting the K images into a convolutional neural network for feature extraction, and obtaining a plurality of image features, wherein N, K is an integer greater than or equal to 2;
establishing an image feature pair according to the similarity between different image features;
extracting common feature vectors from the constructed image feature pairs through common feature learning;
calculating a gate feature vector corresponding to each feature in the image feature pair based on the common feature vector;
and inputting the features of the image feature pair combined with the gate feature vector into a classifier, and optimizing according to a set loss function to obtain a trained convolutional neural network and a trained classifier.
2. The method of claim 1, wherein the constructing pairs of image features based on similarities between different image features comprises:
for each image feature x1Calculating the nearest image feature in or among the classes according to the Euclidean distance, and marking as x2And forming 2 × N × K image feature pairs.
3. The method of claim 2, wherein extracting the common feature vector comprises:
image feature pair x1And x2Splicing is carried out, the spliced features are respectively sent into a plurality of full connection layers, and a common feature vector is obtained and expressed as:
xm=fm([x1,2])。
4. the method of claim 3, wherein computing a gate feature vector for each feature in the image feature pair comprises:
common feature vector xmRespectively multiplying the feature points of the image feature pairs, and normalizing through a sigmoid function to obtain corresponding gate feature vectors, wherein the gate feature vectors are expressed as:
gi=sigmoid(xmi),∈{1,2。
5. the method of claim 4, wherein the features of each feature in the image feature pair combined with the gate feature vector comprise four representations, respectively
Figure FDA0002377235760000011
Figure FDA0002377235760000012
Figure FDA0002377235760000013
Figure FDA0002377235760000014
Figure FDA0002377235760000021
Wherein
Figure FDA0002377235760000022
Representing the result of the multiplication of the own image feature with the corresponding gate feature vector point,
Figure FDA0002377235760000023
representing the result of the point multiplication of the image features of itself with other gate feature vectors, g1,g2Representing the gate feature vector.
6. The method of claim 1, wherein the loss function is set to:
Figure FDA0002377235760000024
wherein, yiA label representing a true classification of the object,
Figure FDA0002377235760000025
a classification probability vector representing the classifier output.
7. The method of claim 1, wherein the loss function is represented as:
Figure FDA0002377235760000026
wherein
Figure FDA0002377235760000027
Representing probability vectors
Figure FDA0002377235760000028
C iniThe score corresponding to the class, ∈, represents the threshold.
8. An image recognition method based on an attention interaction mechanism comprises the following steps:
feeding a single picture into the trained convolutional neural network of claim 1, extracting the corresponding image feature x*X is to be*And sending the data to the trained classifier to obtain a final classification result.
9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
10. An electronic device comprising a memory and a processor, on which a computer program is stored which is executable on the processor, characterized in that the steps of the method according to any of claims 1 to 8 are implemented when the processor executes the program.
CN202010070791.0A 2020-01-21 2020-01-21 Image recognition method based on attention interaction mechanism Active CN111325237B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010070791.0A CN111325237B (en) 2020-01-21 2020-01-21 Image recognition method based on attention interaction mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010070791.0A CN111325237B (en) 2020-01-21 2020-01-21 Image recognition method based on attention interaction mechanism

Publications (2)

Publication Number Publication Date
CN111325237A true CN111325237A (en) 2020-06-23
CN111325237B CN111325237B (en) 2024-01-05

Family

ID=71163304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010070791.0A Active CN111325237B (en) 2020-01-21 2020-01-21 Image recognition method based on attention interaction mechanism

Country Status (1)

Country Link
CN (1) CN111325237B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931859A (en) * 2020-08-28 2020-11-13 中国科学院深圳先进技术研究院 Multi-label image identification method and device
CN112487227A (en) * 2020-11-27 2021-03-12 北京邮电大学 Deep learning fine-grained image classification method and device
CN115457308A (en) * 2022-08-18 2022-12-09 苏州浪潮智能科技有限公司 Fine-grained image recognition method and device and computer equipment
CN116051948A (en) * 2023-03-08 2023-05-02 中国海洋大学 Fine granularity image recognition method based on attention interaction and anti-facts attention

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180068463A1 (en) * 2016-09-02 2018-03-08 Artomatix Ltd. Systems and Methods for Providing Convolutional Neural Network Based Image Synthesis Using Stable and Controllable Parametric Models, a Multiscale Synthesis Framework and Novel Network Architectures
CN110119477A (en) * 2019-05-14 2019-08-13 腾讯科技(深圳)有限公司 A kind of information-pushing method, device and storage medium
CN110263697A (en) * 2019-06-17 2019-09-20 哈尔滨工业大学(深圳) Pedestrian based on unsupervised learning recognition methods, device and medium again

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180068463A1 (en) * 2016-09-02 2018-03-08 Artomatix Ltd. Systems and Methods for Providing Convolutional Neural Network Based Image Synthesis Using Stable and Controllable Parametric Models, a Multiscale Synthesis Framework and Novel Network Architectures
CN110119477A (en) * 2019-05-14 2019-08-13 腾讯科技(深圳)有限公司 A kind of information-pushing method, device and storage medium
CN110263697A (en) * 2019-06-17 2019-09-20 哈尔滨工业大学(深圳) Pedestrian based on unsupervised learning recognition methods, device and medium again

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931859A (en) * 2020-08-28 2020-11-13 中国科学院深圳先进技术研究院 Multi-label image identification method and device
CN111931859B (en) * 2020-08-28 2023-10-24 中国科学院深圳先进技术研究院 Multi-label image recognition method and device
CN112487227A (en) * 2020-11-27 2021-03-12 北京邮电大学 Deep learning fine-grained image classification method and device
CN112487227B (en) * 2020-11-27 2023-12-26 北京邮电大学 Fine granularity image classification method and device for deep learning
CN115457308A (en) * 2022-08-18 2022-12-09 苏州浪潮智能科技有限公司 Fine-grained image recognition method and device and computer equipment
CN115457308B (en) * 2022-08-18 2024-03-12 苏州浪潮智能科技有限公司 Fine granularity image recognition method and device and computer equipment
CN116051948A (en) * 2023-03-08 2023-05-02 中国海洋大学 Fine granularity image recognition method based on attention interaction and anti-facts attention
CN116051948B (en) * 2023-03-08 2023-06-23 中国海洋大学 Fine granularity image recognition method based on attention interaction and anti-facts attention

Also Published As

Publication number Publication date
CN111325237B (en) 2024-01-05

Similar Documents

Publication Publication Date Title
Zhou et al. Interpretable basis decomposition for visual explanation
Unnikrishnan et al. Toward objective evaluation of image segmentation algorithms
US7903883B2 (en) Local bi-gram model for object recognition
CN111325237A (en) Image identification method based on attention interaction mechanism
WO2019015246A1 (en) Image feature acquisition
Kim et al. Color–texture segmentation using unsupervised graph cuts
CN111582409B (en) Training method of image tag classification network, image tag classification method and device
CN109993102B (en) Similar face retrieval method, device and storage medium
US20190019052A1 (en) Text Region Detection in Digital Images using Image Tag Filtering
Guan et al. A unified probabilistic model for global and local unsupervised feature selection
CN108334805B (en) Method and device for detecting document reading sequence
CN112085072B (en) Cross-modal retrieval method of sketch retrieval three-dimensional model based on space-time characteristic information
US11803971B2 (en) Generating improved panoptic segmented digital images based on panoptic segmentation neural networks that utilize exemplar unknown object classes
Zhang et al. Large-scale aerial image categorization using a multitask topological codebook
CN111523421A (en) Multi-user behavior detection method and system based on deep learning and fusion of various interaction information
US20200218932A1 (en) Method and system for classification of data
Zhang et al. Appearance-based loop closure detection via locality-driven accurate motion field learning
Buenaposada et al. Improving multi-class boosting-based object detection
CN110111365B (en) Training method and device based on deep learning and target tracking method and device
Arya et al. Local triangular coded pattern: A texture descriptor for image classification
CN113240033B (en) Visual relation detection method and device based on scene graph high-order semantic structure
US20220319233A1 (en) Expression recognition method and apparatus, electronic device, and storage medium
CN111611919B (en) Road scene layout analysis method based on structured learning
CN117115824A (en) Visual text detection method based on stroke region segmentation strategy
CN115310606A (en) Deep learning model depolarization method and device based on data set sensitive attribute reconstruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant