CN110472090B

CN110472090B - Image retrieval method based on semantic tags, related device and storage medium

Info

Publication number: CN110472090B
Application number: CN201910770152.2A
Authority: CN
Inventors: 刘龙坡
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-08-20
Filing date: 2019-08-20
Publication date: 2023-10-27
Anticipated expiration: 2039-08-20
Also published as: CN110472090A

Abstract

The application discloses an image retrieval method based on semantic tags, a related device and a storage medium, wherein the image retrieval method based on semantic tags is characterized in that through training of an image classification model, tag association is carried out on all images in a retrieval database, when a user inputs retrieval information, the images required by retrieval can be obtained by comparing the similarity between the tags indicated by the retrieval information and the tags of the images, so that the image retrieval process based on the semantic tags is realized.

Description

Image retrieval method based on semantic tags, related device and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image retrieval method based on semantic tags, and a related device and storage medium.

Background

Computer Vision (CV) is a science of researching how to make a machine "look at", and more specifically, to replace human eyes with a camera and a computer to perform machine vision such as recognition, tracking and measurement on a target, and further perform graphic processing, so that the computer processes the target into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data.

The image retrieval can be performed based on a computer vision technology, the current image retrieval system mainly utilizes basic features of images to perform similarity calculation on the images, for example, utilizes a model resnet which is pre-trained on an ImageNet to extract all image features in a database, extracts features of the images to be retrieved, performs similarity calculation on the images, and returns images with high similarity.

However, this type of method has a certain defect in searching in image semantics, for example, a user needs to find a cake image to extract features first, and in a scene of a large number of samples, the process is not easy to perform, and the convenience and flexibility of the image searching process are affected.

Disclosure of Invention

In view of the foregoing, a first aspect of the present application provides an image retrieval method based on semantic tags, which is applicable to an image retrieval system or a program process, and specifically includes: determining the corresponding relation between a first image and N labels, wherein the N labels are used for indicating the display content of the first image, and N is a positive integer;

extracting feature information of the first image through a first preset algorithm to generate feature vectors based on preset dimensions, wherein the first preset algorithm comprises a convolutional neural network;

Converting the N labels into N word vectors according to a second preset algorithm;

training a preset model according to the corresponding relation between the feature vector and the N word vectors to obtain an image classification model, wherein the image classification model is used for generating corresponding M labels according to a second image, M is less than or equal to N, and M is a positive integer;

and if the similarity between the M labels and the R labels meets a preset condition, determining that the second image is a retrieval result corresponding to the retrieval information, wherein the R labels are labels indicated by the retrieval information, and R is a positive integer.

Preferably, in some possible implementations of the present application, training the preset model according to the correspondence between the feature vector and the N word vectors to obtain an image classification model includes:

determining A nodes according to the N word vectors to construct a collusion matrix, wherein the collusion matrix is used for indicating the association relation of any two nodes in the A nodes, A is less than or equal to N, and A is a positive integer;

determining a network topology structure according to the association relation of any two nodes in the A nodes;

training the network topology structure through a second preset algorithm to obtain an image classification model, wherein the second preset algorithm is used for indicating the A nodes in the network topology structure to perform feature vector exchange, and the second preset algorithm comprises a graph convolution neural network.

Preferably, in some possible implementations of the present application, the N word vectors include a first tag and a second tag, and determining a nodes according to the N word vectors to construct a collared matrix includes:

processing the N word vectors according to a preset rule to construct a target matrix, wherein the preset rule is set based on the co-occurrence relation of the N word vectors, and the target matrix is used for indicating the co-occurrence ratio of the first label and the second label;

and if the target matrix meets a preset condition, converting the N word vectors into the A nodes so that the target matrix is converted into the collarband matrix, wherein the preset condition is set based on the co-occurrence ratio of the first label and the second label.

Preferably, in some possible implementations of the present application, the determining a network topology according to an association relationship between any two nodes of the a nodes includes:

determining a first node of the a nodes;

respectively calculating whether a connecting edge exists between the first node and a node except the first node in the A nodes according to a judging rule to obtain a judging result;

And determining a network topology structure according to the corresponding relation between the A nodes and the judging result.

Preferably, in some possible implementations of the present application, the converting the N labels into N word vectors according to a second preset algorithm includes:

acquiring the vocabulary of each tag in the N tags;

if the vocabulary quantity is larger than a preset threshold value, calculating an average value of the vocabulary quantity;

and converting the average value of the vocabulary into N word vectors according to a second preset algorithm.

Preferably, in some possible implementations of the present application, the generating corresponding M labels according to the second image includes:

judging the probability that the N labels correspond to the second image according to the image classification model;

and determining M tags of which the probabilities meet the classification conditions.

Preferably, in some possible implementations of the present application, the determining, according to the image classification model, a probability that the N labels correspond to the second image includes:

outputting the feature vector of the second image according to the image classification model;

calculating node vectors corresponding to the N labels according to the feature vectors of the second image through a sigmoid function to obtain label parameters;

And normalizing the label parameters to obtain the probability that the N labels correspond to the second image.

A second aspect of the present application provides another image retrieval apparatus, comprising: the device comprises a determining unit, a display unit and a display unit, wherein the determining unit is used for determining the corresponding relation between a first image and N labels, the N labels are used for indicating the display content of the first image, and N is a positive integer;

the extraction unit is used for extracting the characteristic information of the first image through a first preset algorithm to generate a characteristic vector based on a preset dimension, and the first preset algorithm comprises a convolutional neural network;

the conversion unit is used for converting the N labels into N word vectors according to a second preset algorithm;

the training unit is used for training a preset model according to the corresponding relation between the feature vector and the N word vectors to obtain an image classification model, wherein the image classification model is used for generating corresponding M labels according to the second image, M is less than or equal to N, and M is a positive integer;

and the searching unit is used for determining the second image as a searching result corresponding to the searching information if the similarity between the M labels and the R labels meets a preset condition, wherein the R labels are labels indicated by the searching information, and the R is a positive integer.

Preferably, in some possible implementations of the application,

the training unit is specifically configured to determine a nodes according to the N word vectors, so as to construct a tie matrix, where the tie matrix is used to indicate an association relationship between any two nodes in the a nodes, a is less than or equal to N, and a is a positive integer;

the training unit is specifically configured to determine a network topology structure according to an association relationship between any two nodes in the a nodes;

the training unit is specifically configured to train the network topology structure through a second preset algorithm to obtain an image classification model, where the second preset algorithm is used to instruct the a nodes in the network topology structure to perform feature vector exchange, and the second preset algorithm includes a graph convolution neural network.

Preferably, in some possible implementations of the present application, the N word vectors include a first tag and a second tag,

the training unit is specifically configured to process the N word vectors according to a preset rule, so as to construct a target matrix, where the preset rule is set based on co-occurrence relationships of the N word vectors, and the target matrix is used to indicate a co-occurrence ratio of the first tag and the second tag;

The training unit is specifically configured to convert the N word vectors into the a nodes if the target matrix meets a preset condition, so that the target matrix is converted into the collarband matrix, where the preset condition is set based on a co-occurrence ratio of the first tag and the second tag.

Preferably, in some possible implementations of the application,

the training unit is specifically configured to determine a first node of the a nodes;

the training unit is specifically configured to calculate, according to a judgment rule, whether a connection edge exists between the first node and a node except the first node in the a nodes, so as to obtain a judgment result;

the training unit is specifically configured to determine a network topology according to the corresponding relationship between the a nodes and the determination result.

Preferably, in some possible implementations of the application,

the conversion unit is specifically configured to obtain a vocabulary of each tag in the N tags;

the conversion unit is specifically configured to calculate an average value of the vocabulary quantity if the vocabulary quantity is greater than a preset threshold value;

the conversion unit is specifically configured to convert the average value of the vocabulary into N word vectors according to a second preset algorithm.

Preferably, in some possible implementations of the application,

the training unit is used for judging the probability that the N labels correspond to the second image according to the image classification model;

the training unit is used for determining M labels of the N labels, wherein the probability of the M labels meets the classification condition.

Preferably, in some possible implementations of the application,

the training unit is used for outputting the feature vector of the second image according to the image classification model;

the training unit is used for calculating node vectors corresponding to the N labels respectively according to the feature vectors of the second image through a sigmoid function so as to obtain label parameters;

and the training unit is used for carrying out normalization processing on the label parameters so as to obtain the probability that the N labels correspond to the second image.

A third aspect of the present application provides a computer apparatus comprising: a memory, a processor, and a bus system; the memory is used for storing program codes; the processor is configured to execute the semantic tag based image retrieval method according to the first aspect or any one of the first aspects according to instructions in the program code.

A fourth aspect of the application provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the semantic tag based image retrieval method of the first aspect or any one of the first aspects.

From the above technical solutions, the embodiment of the present application has the following advantages:

the method comprises the steps of determining the corresponding relation between a first image and a plurality of labels, extracting feature information of the first image by using a convolutional neural network to generate feature vectors based on preset dimensions, and converting the labels into word vectors according to a second preset algorithm; training a preset model according to the corresponding relation between the feature vector and the word vectors to obtain an image classification model so as to carry out label association on all images in a search database, and comparing the similarity between the labels indicated by the search information and the labels of the images when a user inputs the search information to obtain the images required by the search, so that the image search process based on semantic labels is realized.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of a network architecture in which an image retrieval system operates;

FIG. 2 is a flow chart of image retrieval;

FIG. 3 is a flowchart of an image retrieval method based on semantic tags according to an embodiment of the present application;

FIG. 4 is a flowchart of another image retrieval method based on semantic tags according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an interface display for image retrieval according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an image retrieval device according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of another image retrieval device according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides an image retrieval method based on semantic tags, a related device and a storage medium, which can be applied to the process of image retrieval, and particularly comprises the steps of determining the corresponding relation between a first image and a plurality of tags, extracting the characteristic information of the first image by using a convolutional neural network to generate a characteristic vector based on a preset dimension, and converting the plurality of tags into a plurality of word vectors according to a second preset algorithm; training a preset model according to the corresponding relation between the feature vector and the word vectors to obtain an image classification model so as to carry out label association on all images in a search database, and comparing the similarity between the labels indicated by the search information and the labels of the images when a user inputs the search information to obtain the images required by the search, so that the image search process based on semantic labels is realized.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "includes" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that the image retrieval method provided by the application can be applied to the operation process of the image retrieval system, specifically, the image retrieval system can be operated in a network architecture shown in fig. 1, as shown in fig. 1, which is a network architecture diagram operated by the image retrieval system, as shown in the figure, the image retrieval system can acquire retrieval requirements through a plurality of terminals, acquire image data through an image database, and perform analysis training on the images according to a preset rule to generate a plurality of corresponding labels, and it can be understood that five terminals are shown in fig. 1, more or fewer terminal devices can participate in experimental tests in an actual scene, the specific number is determined by the actual scene, and the specific number is not limited herein; in addition, one image database is shown in fig. 1, but in an actual scene, a plurality of image databases may participate, and in particular, in a scene of multi-application image data interaction, the number of specific image databases depends on the actual scene.

It can be understood that the image retrieval system can be operated on a personal mobile terminal, a server or a third party device to provide an image retrieval service of a client so as to obtain a retrieval report; the specific image retrieval system may be in the form of a program running in the above device, may also be running as a system component in the above device, and may also be a cloud service program, where the specific operation mode is determined according to the actual scenario, and is not limited herein.

In order to solve the above problems, the present application proposes an image retrieval method based on semantic tags, which is applied to an image retrieval flow frame shown in fig. 2, and is a image retrieval flow frame diagram shown in fig. 2, in which a user obtains a plurality of tags by inputting a retrieval text; the image database side inputs the images in the database into the image retrieval system, the image retrieval system enables the images in the database to generate a plurality of corresponding labels through the model training method provided by the application, and the corresponding retrieval result can be obtained through similarity calculation between the labels generated by retrieving the text and the labels obtained by model training, so that the image retrieval process based on the semantic labels is realized.

It will be appreciated that three labels are shown in the figure, and that in an actual scenario there may be more or less labels, the specific number being dependent on the actual scenario and not limited thereto.

With reference to fig. 3, fig. 3 is a flowchart of a semantic tag-based image retrieval method according to an embodiment of the present application, where the embodiment of the present application at least includes the following steps:

301. and determining the corresponding relation between the first image and the N labels.

In this embodiment, the N labels are used to indicate display contents of the first image, and N is a positive integer. The first image may be any one of the images in the image database, and the N labels may be a series of labels set according to the content or extension content of the first image, and the correspondence between the first image and the N labels is that set of a plurality of labels indicated by the content of the first image.

302. And extracting the characteristic information of the first image through a first preset algorithm to generate a characteristic vector based on a preset dimension.

In this embodiment, the first preset algorithm is set based on a neural network algorithm, and may include a convolutional neural network; for example: and extracting the characteristics of the image data set by using a model resnet50 convolutional network pre-trained on the ImageNet to obtain 2048-dimensional characteristic vectors.

It can be understood that the preset dimension can be set according to the related requirements of the user, and also can be set according to the setting range of the preset dimension in the history of the image retrieval system, and the specific method is determined according to the actual scene.

303. Converting the N labels into N word vectors according to a second preset algorithm;

in this embodiment, the second preset algorithm is used to vectorize N labels, so that the relationship between the labels can be quantitatively measured, and the relationship between the labels can be mined.

In one possible scenario, the N labels may be converted to N word vectors as characteristic representations of the labels by word2 vec.

It will be appreciated that if a plurality of words are included in a tag, the word vectors for the plurality of words in the tag may be averaged and then the subsequent calculation process performed on the word vectors noted as corresponding tags.

304. Training a preset model according to the corresponding relation between the feature vector and the N word vectors to obtain an image classification model.

In this embodiment, the image classification model is configured to generate corresponding M labels according to the second image, where M is less than or equal to N, and M is a positive integer.

It will be appreciated that, in order to determine the corresponding labels for all the images in the image database, training may be performed according to the data of the first image, or the above steps 301 to 304 may be repeated by selecting a plurality of images, to obtain the corresponding relationship between the feature vector and the word vector, so as to input a preset model for training, and further improve accuracy.

Optionally, the determining process of the correspondence between the feature vector and the N word vectors may be based on a blocking process of the feature vector, that is, determining a nodes according to the N word vectors to construct a collusion matrix, where the collusion matrix is used to indicate an association relationship between any two nodes in the a nodes, a is less than or equal to N, and a is a positive integer; then determining a network topology structure according to the association relation of any two nodes in the A nodes; training the network topology structure through a second preset algorithm to obtain an image classification model, wherein the second preset algorithm is used for indicating the A nodes in the network topology structure to perform feature vector exchange, and the second preset algorithm comprises a graph convolution neural network.

The process of determining a nodes according to the N word vectors may be performed by processing the N word vectors according to a preset rule, so as to construct a target matrix, where the preset rule is set based on a co-occurrence relationship of the N word vectors, and the target matrix is used to indicate a co-occurrence ratio of the first tag and the second tag; if the target matrix satisfies a preset condition, the N word vectors are converted into the A nodes so that the target matrix is converted into the collarband matrix, the preset condition is set based on the co-occurrence ratio of the first label and the second label, for example, m (i, j) =p (i|j) =n is calculated for label i, j _ij /n _j ，n _ij For the number of times tag i, j co-occurs, n _j The number of occurrences of tag j.

305. And if the similarity between the M labels and the R labels meets a preset condition, determining the second image as a retrieval result corresponding to the retrieval information.

In this embodiment, the R tags are tags indicated by the search information, and R is a positive integer.

It can be understood that the search information is related text information input by the user, the corresponding R labels can be determined through the text information, and if the similarity between the M labels and the R labels meets a preset condition, the second image can be determined as a search result corresponding to the search information. The second image may be an image with highest label similarity corresponding to the search information, for example, the similarity of the second image reaches 90%; or an image with a label similarity larger than a certain threshold value corresponding to the search information, for example, an image with a similarity larger than 80% is listed as a search result; specifically, the image with the highest similarity can be output in the retrieval process, or can be a set of a plurality of images with similarity, and the images are ordered according to a certain order, so that the user can select the images conveniently.

According to the embodiment, the corresponding relation between the first image and the plurality of labels is determined, then the characteristic information of the first image is extracted by using a convolutional neural network, so that the characteristic vector is generated based on the preset dimension, and the plurality of labels are converted into a plurality of word vectors according to a second preset algorithm; training a preset model according to the corresponding relation between the feature vector and the word vectors to obtain an image classification model so as to carry out label association on all images in a search database, and comparing the similarity between the labels indicated by the search information and the labels of the images when a user inputs the search information to obtain the images required by the search, so that the image search process based on semantic labels is realized.

In one possible scenario, the association process of feature vectors and a plurality of tag word vectors for an image may be generated based on a determination of a connecting edge of the word vectors; next, the scene is described with reference to the accompanying drawings, as shown in fig. 4, fig. 4 is a flowchart of another image retrieval method based on semantic tags according to an embodiment of the present application, where the embodiment of the present application at least includes the following steps:

401. and determining the corresponding relation between the first image and the N labels.

402. And extracting the characteristic information of the first image through a first preset algorithm to generate a characteristic vector based on a preset dimension.

403. Converting the N labels into N word vectors according to a second preset algorithm;

404. And determining the association relation among the N word vectors so as to judge whether the nodes corresponding to the image feature vectors have connecting edges or not.

In this embodiment, in the process of determining the association relationship between the N word vectors, the N word vectors are first processed according to a preset rule to construct a target matrix, where the preset rule is set based on the co-occurrence relationship of the N word vectors, and the target matrix is set A matrix for indicating a ratio of co-occurrences of the first tag and the second tag; and if the target matrix meets a preset condition, converting the N word vectors into the A nodes so that the target matrix is converted into the collarband matrix, wherein the preset condition is set based on the co-occurrence ratio of the first label and the second label. For example: constructing an n×n matrix m according to the co-occurrence relation of the labels in the training dataset, wherein n is the number of labels, and m (i, j) =p (i|j) =n _ij /n _j ，n _ij For the number of times tag i, j co-occurs, n _j The number of occurrences of tag j. When m (i, j) > τ, let a _ij =1, thereby constructing an adjacency matrix a (n, n), where a (i, j) =a _ij 。

Then, determining a first node of the A nodes; respectively calculating whether a connecting edge exists between the first node and a node except the first node in the A nodes according to a judging rule to obtain a judging result; the network topology is determined according to the correspondence between the a nodes and the determination result, and it can be understood that the above processing analysis may be performed for each node in the a nodes, for example: constructing a network topology structure G= (V, E), wherein V is a node, namely N labels are corresponding to each other, E is a connecting edge between the nodes, whether the connecting edge exists between the two nodes or not is determined according to element values in an adjacent matrix A of the nodes, and when a _ij When=1, there is a connecting edge between node i and node j, otherwise there is no connecting edge.

405. Training the model through a graph convolution neural network to obtain an image classification model.

In this embodiment, through the network topology constructed in the above step 404, and then through the training graph rolling neural network model to train the network topology, during training, the feature vector of each node will be propagated to the adjacent nodes, so each node can absorb information from the adjacent node vectors, and the node information has text labels obtained through word2vec conversion learning, so that certain semantic adjacent information can be kept, and therefore, after the adjacent nodes are transmitted through the network, certain semantic adjacent information can still be kept, and after training, an image classification model is obtained under the condition of keeping certain semantic adjacent.

406. And determining the labels of the images in the database according to the image classification model.

In this embodiment, the label determining process may be outputting a feature vector of the second image according to the image classification model; calculating node vectors corresponding to the N labels according to the feature vectors of the second image through a sigmoid function to obtain label parameters; and normalizing the label parameters to obtain the probability that the N labels correspond to the second image.

Specifically, since each node obtains a node vector, in order to match with the image feature dimension, the output dimension of each node may be set to be a preset 2048, and each node may be used as a classifier w _i The fraction σ (w _i x), wherein the image vector x to be classified is a vector representation of image features of each image in an image database (including a second image) corresponding to the first image, which is not provided with corresponding labels, sigma is a sigmoid function, and the output score is normalized to [0,1 ]]The score is then used as a probability that the text content belongs to the tag, by setting a preset probability, for example: the probability p > 0.5, the label is included, and the purpose of multi-label classification is achieved.

407. And according to the similarity between the retrieval information input by the user and the related image labels.

In this embodiment, the search information is related text information input by the user, and the corresponding R labels can be determined by the text information, and if the similarity between the M labels and the R labels meets a preset condition, the second image can be determined as a search result corresponding to the search information.

408. And determining a search result.

In this embodiment, the search result is a second image, and it can be understood that the second image may be an image with the highest label similarity corresponding to the search information, for example, the similarity of the second image reaches 90%; or an image with a label similarity larger than a certain threshold value corresponding to the search information, for example, an image with a similarity larger than 80% is listed as a search result; specifically, the image with the highest similarity can be output in the retrieval process, or can be a set of a plurality of images with similarity, and the images are ordered according to a certain order, so that the user can select the images conveniently.

In one possible display manner, a display manner as shown in fig. 5 may be adopted, and fig. 5 is a schematic diagram of an interface display for image retrieval according to an embodiment of the present application. The interface may include label information entered by the user and results output. When the user needs to search the related images, a plurality of corresponding labels are generated according to the text information input by the user, for example: the landscape and the hills can obtain corresponding retrieval pictures, if the user needs to know a specific retrieval process, the user can click a detail button, a retrieved image similarity list can be displayed, and the user can select other images meeting the self requirements in the list.

It will be appreciated that the relevant elements in the steps corresponding to the embodiments of fig. 3 and fig. 4 may be displayed in the interface, and the specific content is not limited herein, and is determined by the actual scenario.

In order to better implement the above-described aspects of the embodiments of the present application, the following provides related apparatuses for implementing the above-described aspects. Referring to fig. 6, fig. 6 is a schematic structural diagram of an image retrieval device according to an embodiment of the present application, and an image retrieval device 600 includes:

a determining unit 601, configured to determine a correspondence between a first image and N labels, where N is a positive integer, and the N labels are used to indicate display content of the first image;

An extracting unit 602, configured to extract feature information of the first image by a first preset algorithm, so as to generate a feature vector based on a preset dimension, where the first preset algorithm includes a convolutional neural network;

a conversion unit 603, configured to convert the N labels into N word vectors according to a second preset algorithm;

training unit 604, configured to train a preset model according to the correspondence between the feature vector and the N word vectors, so as to obtain an image classification model, where the image classification model is configured to generate corresponding M labels according to the second image, M is less than or equal to N, and M is a positive integer;

and the retrieving unit 605 is configured to determine that the second image is a retrieval result corresponding to the retrieval information if the similarity between the M labels and R labels meets a preset condition, where the R labels are labels indicated by the retrieval information, and R is a positive integer.

Preferably, in some possible implementations of the application,

the training unit 604 is specifically configured to determine a nodes according to the N word vectors, so as to construct a tie matrix, where the tie matrix is used to indicate an association relationship between any two nodes in the a nodes, a is less than or equal to N, and a is a positive integer;

the training unit 604 is specifically configured to determine a network topology according to an association relationship between any two nodes in the a nodes;

The training unit 604 is specifically configured to train the network topology through a second preset algorithm to obtain an image classification model, where the second preset algorithm is used to instruct the a nodes in the network topology to perform feature vector exchange, and the second preset algorithm includes a graph convolution neural network.

the training unit 604 is specifically configured to process the N word vectors according to a preset rule, where the preset rule is set based on a co-occurrence relationship of the N word vectors, and the target matrix is used to indicate a co-occurrence ratio of the first tag and the second tag;

the training unit 604 is specifically configured to convert the N word vectors into the a nodes if the target matrix meets a preset condition, so that the target matrix is converted into the collared matrix, where the preset condition is set based on a co-occurrence ratio of the first tag and the second tag.

Preferably, in some possible implementations of the application,

the training unit 604 is specifically configured to determine a first node of the a nodes;

The training unit 604 is specifically configured to calculate, according to a judgment rule, whether a connection edge exists between the first node and a node except the first node in the a nodes, so as to obtain a judgment result;

the training unit 604 is specifically configured to determine a network topology according to the correspondence between the a nodes and the determination result.

Preferably, in some possible implementations of the application,

the converting unit 603 is specifically configured to obtain a vocabulary of each tag in the N tags;

the converting unit 603 is specifically configured to calculate an average value of the vocabulary amount if the vocabulary amount is greater than a preset threshold;

the conversion unit 603 is specifically configured to convert the average value of the vocabulary into N word vectors according to a second preset algorithm.

Preferably, in some possible implementations of the application,

the training unit 604 is configured to determine probabilities that the N labels correspond to the second image according to the image classification model;

the training unit 604 is configured to determine M tags that have probabilities satisfying a classification condition from the N tags.

Preferably, in some possible implementations of the application,

The training unit 604 is configured to output a feature vector of the second image according to the image classification model;

the training unit 604 is configured to calculate, according to a sigmoid function, node vectors corresponding to the N labels respectively according to the feature vectors of the second image, so as to obtain label parameters;

the training unit 604 is configured to normalize the tag parameters to obtain probabilities that the N tags correspond to the second image.

Referring to fig. 7, fig. 7 is a schematic structural diagram of another image retrieval device according to an embodiment of the present application, where the image retrieval device 700 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 722 (e.g., one or more processors) and a memory 732, and one or more storage media 730 (e.g., one or more mass storage devices) storing application programs 742 or data 744. Wherein memory 732 and storage medium 730 may be transitory or persistent. The program stored in the storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations to the image retrieval device. Still further, the central processor 722 may be configured to communicate with the storage medium 730 to execute a series of instruction operations in the storage medium 730 on the image retrieval device 700.

The image retrieval device 700 may also include one or more power supplies 726, one or more wired or wireless network interfaces 750, one or more input/output interfaces 758, and/or one or more operating systems 741, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.

The steps performed by the image retrieval device in the above-described embodiment may be based on the image retrieval device structure shown in fig. 7.

Embodiments of the present application also provide a computer readable storage medium having stored therein image retrieval instructions which, when executed on a computer, cause the computer to perform the steps performed by the image retrieval apparatus in the method described in the embodiments of figures 2 to 5 as described above.

There is also provided in an embodiment of the application a computer program product comprising image retrieval instructions which, when run on a computer, cause the computer to perform the steps performed by the image retrieval apparatus in the method described in the embodiment of figures 2 to 5 as hereinbefore described.

The embodiment of the application also provides an image retrieval system, which can comprise the image retrieval device in the embodiment shown in fig. 6 or the image retrieval device shown in fig. 7.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, an image retrieval device, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. An image retrieval method based on semantic tags, comprising the steps of:

determining the corresponding relation between a first image and N labels, wherein the N labels are used for indicating the display content of the first image, and N is a positive integer;

training the network topology structure through a second preset algorithm to obtain an image classification model, wherein the second preset algorithm is used for indicating the A nodes in the network topology structure to perform feature vector exchange, the second preset algorithm comprises a graph convolution neural network, the image classification model is used for generating corresponding M labels according to a second image, M is less than or equal to N, and M is a positive integer;

2. The method of claim 1, wherein the N word vectors include a first tag and a second tag, wherein the determining a nodes from the N word vectors to construct a collared matrix comprises:

3. The method according to claim 1, wherein the determining the network topology according to the association relationship between any two of the a nodes includes:

Determining a first node of the a nodes;

4. A method according to any one of claims 1-3, wherein said converting said N labels into N word vectors according to a second preset algorithm comprises:

acquiring the vocabulary of each tag in the N tags;

5. A method according to any one of claims 1-3, wherein said generating respective M labels from the second image comprises:

6. The method of claim 5, wherein said determining the probability that the N labels correspond to the second image according to the image classification model comprises:

7. An image retrieval device based on semantic tags, comprising:

the device comprises a determining unit, a display unit and a display unit, wherein the determining unit is used for determining the corresponding relation between a first image and N labels, the N labels are used for indicating the display content of the first image, and N is a positive integer;

the training unit is used for determining A nodes according to the N word vectors so as to construct a tie matrix, wherein the tie matrix is used for indicating the association relation of any two nodes in the A nodes, A is less than or equal to N, and A is a positive integer; determining a network topology structure according to the association relation of any two nodes in the A nodes; training the network topology structure through a second preset algorithm to obtain an image classification model, wherein the second preset algorithm is used for indicating the A nodes in the network topology structure to perform feature vector exchange, the second preset algorithm comprises a graph convolution neural network, the image classification model is used for generating corresponding M labels according to a second image, M is less than or equal to N, and M is a positive integer;

8. The apparatus of claim 7, wherein the N word vectors include a first tag and a second tag; the training unit is specifically configured to, when determining a nodes according to the N word vectors to construct a tie matrix:

9. The device according to claim 7, characterized in that the training unit is specifically configured to:

determining a first node of the a nodes;

10. The device according to any one of claims 7-9, characterized in that the conversion unit is specifically configured to:

acquiring the vocabulary of each tag in the N tags;

11. The device according to any one of claims 7-9, wherein the training unit is specifically configured to:

12. The device according to claim 11, characterized in that the training unit is specifically configured to:

13. A computer device, the computer device comprising a processor and a memory:

the memory is used for storing program codes; the processor is configured to perform the semantic tag based image retrieval method of any one of claims 1 to 6 according to instructions in the program code.

14. A computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the semantic tag based image retrieval method of any one of the preceding claims 1 to 6.