CN113095349A - Image identification method and device - Google Patents

Image identification method and device Download PDF

Info

Publication number
CN113095349A
CN113095349A CN202010022725.6A CN202010022725A CN113095349A CN 113095349 A CN113095349 A CN 113095349A CN 202010022725 A CN202010022725 A CN 202010022725A CN 113095349 A CN113095349 A CN 113095349A
Authority
CN
China
Prior art keywords
semantic
label
feature
tag
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010022725.6A
Other languages
Chinese (zh)
Inventor
刘义明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010022725.6A priority Critical patent/CN113095349A/en
Publication of CN113095349A publication Critical patent/CN113095349A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image recognition method and device, and relates to the technical field of computers. One embodiment of the method comprises: receiving image information, inputting the image information into a full convolution neural network, and generating a characteristic diagram; performing semantic specific representation on the semantic region of each category label based on the feature map; and associating the semantic feature representations by using a knowledge graph based on tag co-occurrence, and using the associated knowledge graph to predict the distribution of the tags. The implementation method can solve the problem that the mutual correlation between semantic regions in the image is difficult and inaccurate when the semantic regions are learned in the prior art.

Description

Image identification method and device
Technical Field
The invention relates to the technical field of computers, in particular to an image recognition method and device.
Background
Multi-label image classification is a fundamental but widely used computer vision task because in the real world, an image often contains many different semantic objects, such as a landscape image, which may have semantic objects like a sea, a villa, a yacht, etc. Recently, it has received increasing attention and has been largely applied to a content retrieval and recommendation system for images. The multi-label classification still has many problems, such as the visual angle transformation of images, the specification of semantic targets with different sizes, the influence of illumination factors and partial shielding. How to mine semantic feature regions corresponding to multiple labels in an image and utilize correlation information among the semantic feature regions is an unsolved and challenging task of multi-label image classification.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
in the prior art, although progress has been made by searching for semantic perception regions and modeling tag associations, the model effect is still not ideal. Due to lack of supervision and guidance, semantic regions can only be roughly located at present. Moreover, it is difficult to fully learn the correlation between semantic regions, nor does it explicitly model tag co-occurrence.
Disclosure of Invention
In view of this, embodiments of the present invention provide an image recognition method and apparatus, which can solve the problem in the prior art that learning the correlation between semantic regions in an image is difficult and inaccurate.
In order to achieve the above object, according to an aspect of the embodiments of the present invention, there is provided an image recognition method, including receiving image information, inputting the image information into a full convolution neural network, and generating a feature map; performing semantic specific representation on the semantic region of each category label based on the feature map; and associating the semantic feature representations by using a knowledge graph based on tag co-occurrence, and using the associated knowledge graph to predict the distribution of the tags.
Optionally, performing semantic specific representation on the semantic region of each category label based on the feature map, including:
extracting semantic embedding vectors from the semantic region of each category label by adopting a preset word embedding model based on the feature map;
according to the semantic attention mechanism, a semantic embedding vector corresponding to the label category is learned to obtain a feature vector of the label category.
Optionally, learning a semantic embedding vector corresponding to the tag class according to a semantic attention mechanism to obtain a feature vector of the tag class includes:
acquiring each position point in the semantic area according to the semantic area of each category label;
fusing a feature map and a semantic embedded vector corresponding to each position point by using a low-rank bilinear pooling method to obtain a feature vector of each position point;
under the guidance of the semantic embedded vector, calculating an attention coefficient of each position point;
the attention coefficients of all the location points are summed with the product of the feature vector to obtain the feature vector of the tag class.
Optionally, after calculating the attention coefficient of each location point, the method includes:
normalizing attention coefficients of all position points by using a logistic regression function;
and summing products of the normalized attention coefficients and the feature vectors of all the position points to obtain the feature vectors of the label categories.
Optionally, associating the semantic feature representations with a knowledge-graph based on tag co-occurrence, comprising:
calculating association probability between all label category pairs by using label annotation based on a data set covering the label categories to obtain a knowledge graph based on label co-occurrence;
and learning the semantic feature representation of each label category through the knowledge graph by adopting a gated cycle updating mechanism so as to perform association.
Optionally, learning, by using a gated round robin update mechanism, semantic feature representations of each tag class through the knowledge graph for association, includes:
acquiring a hidden state at a time step for each label category;
aggregating messages from neighbor tag categories based on the knowledge graph according to the hidden states of the tag categories to obtain the aggregated tag categories, and further updating the hidden states and the hidden states at the previous time step through a gating mechanism of a gating circulation unit;
and circulating the processes until the final hidden state of the label categories is obtained, and further associating the semantic feature representation of each label category.
Optionally, the method further comprises:
and realizing a feature extractor based on the depth residual error network so as to input the image information into the full convolution neural network to generate a feature map.
In addition, according to an aspect of the embodiments of the present invention, there is provided an image recognition apparatus, including a receiving module, configured to receive image information, input the image information into a full convolution neural network, and generate a feature map; performing semantic specific representation on the semantic region of each category label based on the feature map; and the identification module is used for associating the semantic feature representations by using a knowledge graph based on tag co-occurrence so as to predict the distribution of the tags.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any of the image recognition embodiments described above.
According to another aspect of the embodiments of the present invention, there is also provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method according to any of the above-mentioned embodiments based on image recognition.
One embodiment of the above invention has the following advantages or benefits: the method comprises the steps of receiving image information, inputting the image information into a full convolution neural network, and generating a characteristic diagram; performing semantic specific representation on the semantic region of each category label based on the feature map; and associating the semantic feature representations by using a knowledge graph based on tag co-occurrence, and using the associated knowledge graph to predict the distribution of the tags. Thus, the present invention can better learn semantic feature areas and can explore interactions with multi-labeled image recognition.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a schematic diagram of a main flow of an image recognition method according to a first embodiment of the present invention;
fig. 2 is a schematic diagram of a main flow of an image recognition method according to a second embodiment of the present invention;
fig. 3 is a schematic diagram of main blocks of an image recognition apparatus according to an embodiment of the present invention;
FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 5 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of a main flow of an image recognition method according to an embodiment of the present invention, which may include:
and step S101, receiving image information, inputting the image information into a full convolution neural network, and generating a characteristic diagram.
Preferably, the feature extractor is implemented based on a depth residual error network to input the image information into a full convolution neural network to generate the feature map. Preferably, the feature extractor is implemented based on ResNet-101. Wherein, the full convolution neural network can realize the classification of pixel level.
And step S102, performing semantic specific representation on the semantic region of each category label based on the feature map.
Preferably, based on the feature map, a preset word embedding model is adopted for the semantic region of each category label to extract a semantic embedding vector. Then, according to a semantic attention mechanism, a semantic embedding vector corresponding to the label category is learned to obtain a feature vector of the label category.
Wherein, the attention mechanism can learn different parts to be combined, namely only paying attention to some parts with special importance, acquiring needed information and constructing a certain description about the environment.
Preferably, a GloVe model is used to extract the semantic embedded vectors. The GloVe model is an unsupervised word vector model and essentially reduces the dimension of a co-occurrence matrix.
Further, when learning the semantic embedding vector corresponding to the tag class according to the semantic attention mechanism to obtain the feature vector of the tag class, each position point within the semantic region may be obtained according to the semantic region of each class tag. And fusing the corresponding characteristic diagram of each position point and the semantic embedded vector by using a low-rank bilinear pooling method to obtain the characteristic vector of each position point. Under the guidance of the semantic embedding vector, the attention coefficient of each position point is calculated. The attention coefficients of all the location points are summed with the product of the feature vector to obtain the feature vector of the tag class.
Wherein, the low-rank bilinear pooling is to introduce a pooling matrix to the output vector.
Further, after the attention coefficient of each position point is calculated, the attention coefficients of all the position points are normalized using a logistic regression function. Then, the products of the normalized attention coefficients and the feature vectors are summed for all the position points to obtain the feature vectors of the label categories.
Step S103, the semantic feature representations are associated by using a knowledge graph based on the co-occurrence of the labels, and then are used for predicting the distribution of the labels.
Preferably, based on the data set covering the label categories, the association probability between all the label category pairs is calculated by using label annotations so as to obtain the knowledge graph based on the label co-occurrence. And learning the semantic feature representation of each label category through the knowledge graph by adopting a gated cycle updating mechanism so as to perform association.
The Knowledge Graph (Knowledge Graph) is a series of various graphs showing the relationship between the Knowledge development process and the structure.
Further, when a gated cyclic update mechanism is adopted and semantic feature representation of each tag category is learned through the knowledge graph for association, a hidden state at a time step can be acquired for each tag category. And aggregating messages from the neighbor tag categories based on the knowledge graph according to the hidden states of the tag categories to obtain the aggregated tag categories, and further updating the hidden states and the hidden states at the previous time step through a gating mechanism of a gating circulation unit. And circulating the processes until a final hidden state of the label categories is obtained, and further associating the semantic feature representation of each label category.
Wherein the gated-cycle cell is intended to solve the problem of disappearance of the gradient that occurs in standard RNNs.
According to the various embodiments described above, the image recognition method according to the present invention can introduce category semantics into a specific atlas representation learning framework to guide learning of specific representations of semantics, associate these representations with an atlas constructed based on statistical tag co-occurrence, directly associate all tag pairs in the form of a structured atlas, and introduce a graph propagation mechanism to explore the interaction between them on the premise of explicit statistical tag co-occurrence, thereby resulting in performance improvement.
Fig. 2 is a schematic diagram of a main flow of an image recognition method according to a referential embodiment of the present invention, and the image recognition method may include:
step S201, receiving image information, and implementing a feature extractor based on a depth residual error network so as to input the image information into a full convolution neural network to generate a feature map.
In an embodiment, given an input image I, the skeleton first extracts its feature map fI∈RW×H×NWhere W, H and N are the width, height and number of channels of the feature map, expressed as
fI=fcnn(I) (1)
Wherein f iscnn(. is) a feature extractor, which is implemented by a full convolution neural network.
And step S202, extracting semantic embedding vectors from the semantic region of each category label by adopting a preset word embedding model based on the feature map.
In an embodiment, for each class c, the model framework uses a pre-trained GloVe model to extract one dsSemantic embedding vectors of dimensions
xc=fg(wc) (2)
Wherein, wcIs a semantic word of the tag class c.
Step S203, acquiring each position point in the semantic area according to the semantic area of each category label.
In an embodiment, a semantic attention mechanism is introduced that incorporates a semantic embedding vector xcTo direct more attention to the semantic perception area and thus learn the feature vector corresponding to the label category. First, each location (w, h) within the semantic region of each category label is obtained.
And S204, fusing the feature map and the semantic embedded vector corresponding to each position point by using a low-rank bilinear pooling method to obtain the feature vector of each position point.
In an embodiment, a method of low-rank bilinear pooling is used to fuse corresponding feature maps
Figure BDA0002361377110000071
And semantic embedding vector xc
Figure BDA0002361377110000072
Wherein tanh (. cndot.) is a hyperbolic tangent function,
Figure BDA0002361377110000073
is a learnable parameter, and |, is an element-by-element multiplication operation. d1And d2Is the size of the joint insertion and output features.
In step S205, the attention coefficient of each position point is calculated under the guidance of the semantic embedded vector.
In the examples, in xcUnder the guidance of (1) calculating an attention coefficient:
Figure BDA0002361377110000074
wherein the attention coefficient indicates the importance of the position (w, h). f. ofa(. cndot.) is an attention function, which is implemented by a fully connected network.
In step S206, the attention coefficients of all the position points are normalized by using a logistic regression function.
In an embodiment, to make the attention coefficients easy to compare between different samples, we normalize the attention coefficients for all positions using the softmax function:
Figure BDA0002361377110000081
step S207, summing the products of the normalized attention coefficients and the feature vectors of all the position points to obtain the feature vectors of the label categories.
In an embodiment, a feature vector for a label category is obtained:
Figure BDA0002361377110000082
thus, the process can be repeated for all tag categories and all AND's obtainedLabel class dependent feature vector f0,f1,...,fC-1}。
And step S208, calculating association probability between all label category pairs by using label annotations based on the data set covering the label categories so as to obtain a knowledge graph based on label co-occurrence.
In an embodiment, once the feature vectors corresponding to all the tag classes are obtained, the feature vectors are correlated in the form of a graph constructed based on statistical tag co-occurrence, and a knowledge graph neural network is introduced to explore the interaction between them through a graph propagation message mechanism.
First, the knowledge graph G ═ { V, a }, where nodes are labels and the weights of edges represent the percentage of co-occurrences of label classes (the weights of edges represent the degree of association between labels). In particular, assuming that the dataset covers C label categories, V may be denoted as { V }0,v1,...,vC-1V, element vcRepresenting class c, A may be represented as { a }00,a01,...,a0(C-1),...,a(C-1)(C-1)In which the element acc'Indicating the probability of the presence of an object belonging to the tag class c. Wherein the probability between all pairs of label categories is computed using label annotations for the samples on the training set. It is worth noting that no additional comments are introduced in the above process.
In step S209, for each tag category, the hidden state at the time step is acquired.
In an embodiment, a gated round robin update mechanism is employed to propagate messages through the graph and learn the feature vectors of the label classes of the context. In particular, v for each node (label class)cE.g. V, which has a hidden state at the time step t
Figure BDA0002361377110000083
Since each node corresponds to a particular label class, the hidden state t is initialized to 0 with the feature vector corresponding to the corresponding label class, denoted as
Figure BDA0002361377110000084
Step S210, according to the hidden state of the label category, based on the knowledge graph, aggregating the messages from the neighbor label categories to obtain the aggregated label category, and further updating the hidden state and the hidden state at the previous time step through a gating mechanism of a gating cycle unit. And circulating the process until the final hidden state of the label category is obtained.
In an embodiment, at time step t, messages from its neighbor nodes are aggregated, denoted as:
Figure BDA0002361377110000091
if node c' has a high correlation with node c, message propagation is supported, otherwise message propagation is suppressed. Thus, messages can be propagated through the graph and node interactions explored under the direction of a priori knowledge of statistical tag co-occurrence. Then based on
Figure BDA0002361377110000092
The aggregated feature vector at (a) updates the hidden state and its previous time step
Figure BDA0002361377110000093
The update hidden state is also reached by a gating mechanism similar to gating the loop elements:
Figure BDA0002361377110000094
where σ (·) is a logic sigmoid function, tanh (·) is a hyperbolic tangent function, and it is an element multiplication operation. W is the weight matrix for a in the corresponding state and U is the weight matrix for the last state input h. W and U are different for different z and r, respectively, and are W ^ z, U ^ z and W ^ r, U ^ r. Here z and r are states inside the gated neural unit.
In this way, each node can aggregate messages from other nodes and simultaneously transmit its information through the graph, enabling interaction between all feature vectors corresponding to all label categories. This process is repeated T times and the final hidden state, i.e.
Figure BDA0002361377110000095
Step S211, obtaining the association relation represented by the semantic features, and using the association relation to predict the distribution of the labels.
In an embodiment, each node
Figure BDA0002361377110000096
Not only includes the feature vector of the tag class c, but also carries the contextualized messages from other tag classes. Finally, according to the node
Figure BDA0002361377110000097
And the input feature vector
Figure BDA0002361377110000098
To predict the confidence score for the existence of label class c, expressed as
Figure BDA0002361377110000101
Wherein f isoIs an output function which will
Figure BDA0002361377110000102
And
Figure BDA0002361377110000103
is mapped to the output vector oc. Using a parameter with unshared f0,f1,...,fC-1C classification function of where fc(. a) mixingcThe score is predicted as an input to indicate the probability of the label category c. Thereby, can be pairedAll label categories perform processing and obtain a score vector s ═ s0,s1,...,sC-1}。
As a specific embodiment, feature extractor f is implemented based on ResNet-101cnn(. The) replace the last average pooling layer with another average pooling layer with a size of 2 x 2, stride of 2, and no changes to other layers. For low rank bilinear pooling operation, N, ds,d1And d2Set to 2,048,300,1024 and 1,024, respectively. Thus, fa(. h) is implemented by a fully connected layer of 1,024 to 1 that maps 1,024 eigenvectors to a single attention coefficient. For the knowledge graph neural network, the dimension of the hidden state is set to 2,048, the iteration number T is set to 3, and the vector o is outputcIs also set to 2,048. The output network o (-) can therefore be realized by a fully connected layer of 4,096 to 2,048, followed by a hyperbolic tangent function, each classification network fc(. cndot.) can be realized with a complete connection layer of 2,048 to 1.
Given a dataset comprising M training samples
Figure BDA0002361377110000104
Wherein IiIs the ith diagram yi={yi0,yi1,...,yi(C-1)And is the labeling condition of the corresponding label category. If the sample is labeled with label class c, then yicIs designated as 1, otherwise is designated as 0. Given an image IiA prediction score vector s can be obtainedi={si0,si1,...,si(C-1)And calculating corresponding probability vector p by sigmoid functioni={pi0,pi1,...,pi(C-1)}:
pic=σ(sic) (11)
Using cross entropy as objective loss function
Figure BDA0002361377110000105
To end withThe end-to-end approach trains the loss L. Specifically, the method comprises the following steps: f is initialized with ResNet-101 parameters pre-trained on ImageNet (ImageNet project is a large visualization database for visual object recognition software research) datasetscnnAnd randomly initializing parameters of other layers. Since the lower level parameters pre-trained on ImageNet datasets were well spread in different datasets, at fcnnParameters of the previous 92 convolutional layers were determined and all other layers were jointly optimized. Preferably, an ADAM algorithm (ADAM algorithm is a first order optimization algorithm that can replace the traditional stochastic gradient descent process) is used for training, with a batch size of 4 and momentums of 0.999 and 0.9. The learning rate is initialized to 10-5And divided by 10 when the error is stationary. During training, the input image is resized to 640 x 640, and the number of {640,576,512,384,320} is randomly selected as the width and height to randomly crop the partial picture. Finally, the cut patch is further adjusted to 576 × 576. During the test, we only need to resize the input image to 640 × 640 and then perform center cropping of 576 × 576 size for evaluation.
Fig. 3 is an image recognition apparatus according to an embodiment of the present invention, which includes a receiving module 301 and a recognition module 302, as shown in fig. 3. The receiving module 301 receives image information, inputs the image information to the full convolution neural network, and generates a feature map. And performing semantic specific representation on the semantic region of each category label based on the feature map. The recognition module 302 correlates the semantic feature representations using a knowledge-graph based on tag co-occurrence, which is then used to predict the distribution of tags.
As a preferred embodiment, performing semantic specific representation on the semantic region of each class label based on the feature map may include:
and extracting semantic embedding vectors by adopting a preset word embedding model for the semantic region of each category label based on the characteristic graph. Then, according to a semantic attention mechanism, a semantic embedding vector corresponding to the label category is learned to obtain a feature vector of the label category.
Further, learning the semantic embedding vector corresponding to the tag class according to a semantic attention mechanism to obtain a feature vector of the tag class may include:
and acquiring each position point in the semantic area according to the semantic area of each category label. And fusing the corresponding characteristic diagram of each position point and the semantic embedded vector by using a low-rank bilinear pooling method to obtain the characteristic vector of each position point. Under the guidance of the semantic embedding vector, the attention coefficient of each position point is calculated. The attention coefficients of all the location points are summed with the product of the feature vector to obtain the feature vector of the tag class.
Further, after calculating the attention coefficient for each position point, the attention coefficients for all the position points may be normalized using a logistic regression function. The normalized attention coefficients and the feature vector products are then summed for all location points to obtain the feature vector for the tag class.
As another preferred embodiment, associating the semantic feature representations using a knowledge-graph based on tag co-occurrence may include:
based on the data set covering the label categories, calculating association probability between all label category pairs by using label annotation so as to obtain a knowledge graph based on label co-occurrence. And learning the semantic feature representation of each label category through the knowledge graph by adopting a gated cycle updating mechanism so as to perform association.
Further, learning semantic feature representations of each tag class through the knowledge graph for association using a gated round robin update mechanism, comprising:
for each label category, a hidden state at a time step is obtained. And aggregating messages from the neighbor tag categories based on the knowledge graph according to the hidden states of the tag categories to obtain the aggregated tag categories, and further updating the hidden states and the hidden states at the previous time step through a gating mechanism of a gating circulation unit.
And circulating the processes until the final hidden state of the label categories is obtained, and further associating the semantic feature representation of each label category.
It is also worth noting that the feature extractor is implemented based on a depth residual error network to input image information into a full convolution neural network to generate a feature map.
It should be noted that, in the embodiment of the image recognition apparatus of the present invention, the details of the image recognition method are already described in detail, and therefore, the repeated description is not repeated here.
Fig. 4 shows an exemplary system architecture 400 to which the image recognition method or the image recognition apparatus of the embodiments of the present invention can be applied.
As shown in fig. 4, the system architecture 400 may include terminal devices 401, 402, 403, a network 404, and a server 405. The network 404 serves as a medium for providing communication links between the terminal devices 401, 402, 403 and the server 405. Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 401, 402, 403 to interact with a server 405 over a network 404 to receive or send messages or the like. The terminal devices 401, 402, 403 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 405 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 401, 402, 403. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the image recognition method provided by the embodiment of the present invention is generally executed by the server 405, and accordingly, the image recognition apparatus is generally disposed in the server 405.
It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 5, shown is a block diagram of a computer system 500 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data necessary for the operation of the system 500 are also stored. The CPU501, ROM502, and RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a receiving module and an identifying module. Wherein the names of the modules do not in some cases constitute a limitation of the module itself.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: receiving image information, inputting the image information into a full convolution neural network, and generating a characteristic diagram; performing semantic specific representation on the semantic region of each category label based on the feature map; and associating the semantic feature representations by using a knowledge graph based on tag co-occurrence, and using the associated knowledge graph to predict the distribution of the tags.
According to the technical scheme of the embodiment of the invention, the problems that the mutual correlation between semantic regions in the image is difficult and inaccurate when being learned in the prior art can be solved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An image recognition method, comprising:
receiving image information, inputting the image information into a full convolution neural network, and generating a characteristic diagram;
performing semantic specific representation on the semantic region of each category label based on the feature map;
and associating the semantic feature representations by using a knowledge graph based on tag co-occurrence, and using the associated knowledge graph to predict the distribution of the tags.
2. The method of claim 1, wherein semantically specifically representing the semantic region of each class label based on the feature map comprises:
extracting semantic embedding vectors from the semantic region of each category label by adopting a preset word embedding model based on the feature map;
according to the semantic attention mechanism, a semantic embedding vector corresponding to the label category is learned to obtain a feature vector of the label category.
3. The method of claim 2, wherein learning the semantic embedding vector corresponding to the tag class according to a semantic attention mechanism to obtain the feature vector of the tag class comprises:
acquiring each position point in the semantic area according to the semantic area of each category label;
fusing a feature map and a semantic embedded vector corresponding to each position point by using a low-rank bilinear pooling method to obtain a feature vector of each position point;
under the guidance of the semantic embedded vector, calculating an attention coefficient of each position point;
the attention coefficients of all the location points are summed with the product of the feature vector to obtain the feature vector of the tag class.
4. The method of claim 3, wherein after calculating the attention coefficient for each location point, comprising:
normalizing attention coefficients of all position points by using a logistic regression function;
and summing products of the normalized attention coefficients and the feature vectors of all the position points to obtain the feature vectors of the label categories.
5. The method of claim 1, wherein associating the semantic feature representations with a knowledge-graph based on tag co-occurrence comprises:
calculating association probability between all label category pairs by using label annotation based on a data set covering the label categories to obtain a knowledge graph based on label co-occurrence;
and learning the semantic feature representation of each label category through the knowledge graph by adopting a gated cycle updating mechanism so as to perform association.
6. The method of claim 5, wherein learning semantic feature representations for each tag class through the knowledge-graph for association using a gated round robin update mechanism comprises:
acquiring a hidden state at a time step for each label category;
aggregating messages from neighbor tag categories based on the knowledge graph according to the hidden states of the tag categories to obtain the aggregated tag categories, and further updating the hidden states and the hidden states at the previous time step through a gating mechanism of a gating circulation unit;
and circulating the processes until the final hidden state of the label categories is obtained, and further associating the semantic feature representation of each label category.
7. The method of any of claims 1-6, further comprising:
and realizing a feature extractor based on the depth residual error network so as to input the image information into the full convolution neural network to generate a feature map.
8. An image recognition apparatus, comprising:
the receiving module is used for receiving image information, inputting the image information into the full convolution neural network and generating a characteristic diagram; performing semantic specific representation on the semantic region of each category label based on the feature map;
and the identification module is used for associating the semantic feature representations by using a knowledge graph based on tag co-occurrence so as to predict the distribution of the tags.
9. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202010022725.6A 2020-01-09 2020-01-09 Image identification method and device Pending CN113095349A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010022725.6A CN113095349A (en) 2020-01-09 2020-01-09 Image identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010022725.6A CN113095349A (en) 2020-01-09 2020-01-09 Image identification method and device

Publications (1)

Publication Number Publication Date
CN113095349A true CN113095349A (en) 2021-07-09

Family

ID=76664073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010022725.6A Pending CN113095349A (en) 2020-01-09 2020-01-09 Image identification method and device

Country Status (1)

Country Link
CN (1) CN113095349A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109102024A (en) * 2018-08-14 2018-12-28 中山大学 A kind of Layer semantics incorporation model finely identified for object and its implementation
CN110084296A (en) * 2019-04-22 2019-08-02 中山大学 A kind of figure expression learning framework and its multi-tag classification method based on certain semantic

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109102024A (en) * 2018-08-14 2018-12-28 中山大学 A kind of Layer semantics incorporation model finely identified for object and its implementation
CN110084296A (en) * 2019-04-22 2019-08-02 中山大学 A kind of figure expression learning framework and its multi-tag classification method based on certain semantic

Similar Documents

Publication Publication Date Title
CN110532417B (en) Image retrieval method and device based on depth hash and terminal equipment
CN111709240A (en) Entity relationship extraction method, device, equipment and storage medium thereof
US20230117973A1 (en) Data processing method and apparatus
CN113806582B (en) Image retrieval method, image retrieval device, electronic equipment and storage medium
WO2022174496A1 (en) Data annotation method and apparatus based on generative model, and device and storage medium
CN112861896A (en) Image identification method and device
CN112926308B (en) Method, device, equipment, storage medium and program product for matching text
CN112395487A (en) Information recommendation method and device, computer-readable storage medium and electronic equipment
KR20220047228A (en) Method and apparatus for generating image classification model, electronic device, storage medium, computer program, roadside device and cloud control platform
WO2023231753A1 (en) Neural network training method, data processing method, and device
CN112966701A (en) Method and device for classifying objects
CN112418320A (en) Enterprise association relation identification method and device and storage medium
CN111444335B (en) Method and device for extracting central word
CN116383382A (en) Sensitive information identification method and device, electronic equipment and storage medium
CN113095349A (en) Image identification method and device
CN114882283A (en) Sample image generation method, deep learning model training method and device
CN114898426A (en) Synonym label aggregation method, device, equipment and storage medium
CN114139658A (en) Method for training classification model and computer readable storage medium
CN111274383B (en) Object classifying method and device applied to quotation
CN113743973A (en) Method and device for analyzing market hotspot trend
CN112905786A (en) Label recommendation method and device
CN112417260A (en) Localized recommendation method and device and storage medium
CN117556068B (en) Training method of target index model, information retrieval method and device
CN113157865B (en) Cross-language word vector generation method and device, electronic equipment and storage medium
WO2023236900A1 (en) Item recommendation method and related device thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination