CN111930984A

CN111930984A - Image retrieval method, device, server, client and medium

Info

Publication number: CN111930984A
Application number: CN201910332754.XA
Authority: CN
Inventors: 郭忠强
Original assignee: Beijing Jingdong Zhenshi Information Technology Co Ltd
Current assignee: Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority date: 2019-04-24
Filing date: 2019-04-24
Publication date: 2020-11-13

Abstract

The present disclosure provides an image retrieval method, applied to a server, the method including: receiving content to be retrieved; processing the content to be retrieved to obtain a characteristic vector matrix of the content to be retrieved; coding the characteristic vector matrix of the content to be retrieved by utilizing a characteristic quantization model to obtain a binary code corresponding to the characteristic vector matrix of the content to be retrieved; retrieving a candidate image matched with the binary code from a candidate image library based on the binary code corresponding to the characteristic vector matrix of the content to be retrieved; and sending the candidate image to a client. By the method, the characteristic vector of the content to be retrieved can be converted into the compact binary code, so that the retrieval efficiency can be improved by retrieving the image from the candidate image based on the compact binary code, and the required retrieval speed can be met even if excessive retrieval users are encountered.

Description

Image retrieval method, device, server, client and medium

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to an image retrieval method, an image retrieval device, a server, a client, and a medium.

Background

At present, image retrieval technologies can be generally divided into two categories, one is an image retrieval technology based on image content, and valuable image feature vectors, such as image color features, shape features, position features, gradient features and the like, are extracted from an image and are used for subsequent distance calculation and matching identification, so that the purpose of retrieval is achieved. The other type is a text-based image retrieval technology, each image is labeled manually, and given one image, the text label of the image can be related keywords such as ' Hua ' is ', ' Rongyao 10 ', ' youth version ', ' big screen ', and the like. Each image in the target image library is also labeled, for example, as "hua shi", "rong yao 10", "cell phone", "apple", "cell phone", "iPhone", "XS", "Max", and "large screen". The image and the text label have a mapping relation, when in retrieval, the image is mapped into the text label, and is compared and matched with the text label of each image in the image library, and the image corresponding to the text label with high matching degree is used as a retrieval result. However, in the course of implementing the inventive concept of the present invention, the inventors found that the prior art has at least the following problems: when the number of retrieval users is excessive, the image retrieval method applied to the server or the client is provided by the present disclosure aiming at the technical problem that the current retrieval speed is difficult to meet based on the two retrieval modes.

Disclosure of Invention

Accordingly, the present disclosure is directed to an image retrieval method, apparatus, server, client, and medium that substantially obviate one or more problems due to limitations and disadvantages of the related art.

A first aspect of the present disclosure provides an image retrieval method, applied to a server, the method including: receiving content to be retrieved; processing the content to be retrieved to obtain a characteristic vector matrix of the content to be retrieved; coding the characteristic vector matrix of the content to be retrieved by utilizing a characteristic quantization model to obtain a binary code corresponding to the characteristic vector matrix of the content to be retrieved; retrieving a candidate image matched with the binary code from a candidate image library based on the binary code corresponding to the characteristic vector matrix of the content to be retrieved; and sending the candidate image to a client.

According to an embodiment of the present disclosure, the content to be retrieved includes an image to be retrieved, and processing the content to be retrieved to obtain a feature vector matrix of the content to be retrieved includes: and processing the image to be retrieved by utilizing a convolutional neural network to obtain an image visual characteristic vector matrix of the image to be retrieved.

According to an embodiment of the present disclosure, encoding the eigenvector matrix of the content to be retrieved by using the characteristic quantization model, and obtaining the binary code corresponding to the eigenvector matrix of the content to be retrieved includes: and coding the image visual characteristic vector matrix by using a characteristic quantization model to obtain a binary code corresponding to the image visual characteristic vector matrix.

According to an embodiment of the present disclosure, retrieving, from a candidate image library, a candidate image matching a binary code corresponding to a feature vector matrix of the content to be retrieved based on the binary code includes: and retrieving candidate images matched with the binary codes from the candidate image library based on the binary codes corresponding to the image visual characteristic vector matrixes.

According to an embodiment of the present disclosure, the content to be retrieved includes a text to be retrieved, and processing the content to be retrieved to obtain a feature vector matrix of the content to be retrieved includes: and processing the text to be retrieved by using a word2Vec model to obtain a semantic feature vector matrix of the text to be retrieved.

According to an embodiment of the present disclosure, encoding the eigenvector matrix of the content to be retrieved by using the characteristic quantization model, and obtaining the binary code corresponding to the eigenvector matrix of the content to be retrieved includes: and coding the semantic feature vector matrix by using a feature quantization model to obtain a binary code corresponding to the semantic feature vector matrix.

According to an embodiment of the present disclosure, retrieving, from a candidate image library, a candidate image matching a binary code corresponding to a feature vector matrix of the content to be retrieved based on the binary code includes: and retrieving candidate images matched with the binary codes from the candidate image library based on the binary codes corresponding to the semantic feature vector matrixes.

According to an embodiment of the present disclosure, the content to be retrieved includes an image to be retrieved and a text to be retrieved, and processing the content to be retrieved to obtain a feature vector matrix of the content to be retrieved includes: processing the image to be retrieved by using a convolutional neural network to obtain an image visual characteristic vector matrix of the image to be retrieved, and processing the text to be retrieved by using a word2Vec model to obtain a semantic characteristic vector matrix of the text to be retrieved.

According to the embodiment of the disclosure, after the word2Vec model is used for processing the text to be retrieved to obtain the semantic feature vector matrix of the text to be retrieved, the method further comprises the following steps: and mapping the image visual characteristic vector matrix and the semantic characteristic vector matrix to an image semantic space by using a full-connection conversion layer, and fusing the image visual characteristic vector matrix and the semantic characteristic vector matrix in the image semantic space to obtain an image semantic characteristic vector matrix.

According to an embodiment of the present disclosure, encoding the eigenvector matrix of the content to be retrieved by using the characteristic quantization model, and obtaining the binary code corresponding to the eigenvector matrix of the content to be retrieved includes: and coding the image semantic feature vector matrix by using a feature quantization model to obtain a binary code corresponding to the image semantic feature vector matrix.

According to an embodiment of the present disclosure, retrieving, from a candidate image library, a candidate image matching a binary code corresponding to a feature vector matrix of the content to be retrieved based on the binary code includes: and retrieving candidate images matched with the binary codes from the candidate image library based on the binary codes corresponding to the image semantic feature vector matrixes.

A second aspect of the present disclosure provides an image retrieval method applied to a client, the method including: sending contents to be retrieved to a server so that the server processes the contents to be retrieved to obtain a characteristic vector matrix of the contents to be retrieved, coding the characteristic vector matrix of the contents to be retrieved by using a characteristic quantization model to obtain a binary code corresponding to the characteristic vector matrix of the contents to be retrieved, and retrieving a candidate image matched with the binary code from a candidate image library based on the binary code corresponding to the characteristic vector matrix of the contents to be retrieved; candidate images matching the binary code are received.

According to the embodiment of the disclosure, the content to be retrieved comprises an image to be retrieved and/or a text to be retrieved.

A third aspect of the present disclosure provides an image retrieval apparatus applied to a server, the apparatus including: the receiving module is used for receiving the content to be retrieved; the processing module is used for processing the content to be retrieved to obtain a characteristic vector matrix of the content to be retrieved; the quantization module is used for coding the characteristic vector matrix of the content to be retrieved by utilizing a characteristic quantization model to obtain a binary code corresponding to the characteristic vector matrix of the content to be retrieved; the retrieval module is used for retrieving a candidate image matched with the binary code from a candidate image library based on the binary code corresponding to the characteristic vector matrix of the content to be retrieved; and the sending module is used for sending the candidate images to the client.

According to an embodiment of the present disclosure, the content to be retrieved includes an image to be retrieved, and the processing module is configured to: and processing the image to be retrieved by utilizing a convolutional neural network to obtain an image visual characteristic vector matrix of the image to be retrieved.

According to an embodiment of the present disclosure, the quantization module is configured to: and coding the image visual characteristic vector matrix by using a characteristic quantization model to obtain a binary code corresponding to the image visual characteristic vector matrix.

According to an embodiment of the present disclosure, the search module is configured to: and retrieving candidate images matched with the binary codes from the candidate image library based on the binary codes corresponding to the image visual characteristic vector matrixes.

According to an embodiment of the present disclosure, the content to be retrieved includes a text to be retrieved, and the processing module is configured to: and processing the text to be retrieved by using a word2Vec model to obtain a semantic feature vector matrix of the text to be retrieved.

According to an embodiment of the present disclosure, the quantization module is configured to: and coding the semantic feature vector matrix by using a feature quantization model to obtain a binary code corresponding to the semantic feature vector matrix.

According to an embodiment of the present disclosure, the search module is configured to: and retrieving candidate images matched with the binary codes from the candidate image library based on the binary codes corresponding to the semantic feature vector matrixes.

According to an embodiment of the present disclosure, the content to be retrieved includes an image to be retrieved and a text to be retrieved, and the processing module is configured to: processing the image to be retrieved by using a convolutional neural network to obtain an image visual characteristic vector matrix of the image to be retrieved, and processing the text to be retrieved by using a word2Vec model to obtain a semantic characteristic vector matrix of the text to be retrieved.

According to an embodiment of the present disclosure, the apparatus further comprises: and the fusion module is used for mapping the image visual characteristic vector matrix and the semantic characteristic vector matrix to an image semantic space by using a full-connection conversion layer, and fusing the image visual characteristic vector matrix and the semantic characteristic vector matrix in the image semantic space to obtain an image semantic characteristic vector matrix.

According to an embodiment of the present disclosure, the quantization module is configured to: and coding the image semantic feature vector matrix by using a feature quantization model to obtain a binary code corresponding to the image semantic feature vector matrix.

According to an embodiment of the present disclosure, the search module is configured to: and retrieving candidate images matched with the binary codes from the candidate image library based on the binary codes corresponding to the image semantic feature vector matrixes.

A fourth aspect of the present disclosure provides an image retrieval apparatus applied to a client, the apparatus including: the system comprises a sending module, a searching module and a searching module, wherein the sending module is used for sending contents to be searched to a server so that the server processes the contents to be searched to obtain a characteristic vector matrix of the contents to be searched, the characteristic vector matrix of the contents to be searched is coded by using a characteristic quantization model to obtain a binary code corresponding to the characteristic vector matrix of the contents to be searched, and a candidate image matched with the binary code is searched from a candidate image library based on the binary code corresponding to the characteristic vector matrix of the contents to be searched; and the receiving module is used for receiving the candidate image matched with the binary code.

A fifth aspect of the present disclosure provides a server comprising: one or more processors, and a storage device. The storage device is used for storing one or more programs. Wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the image retrieval method provided by the first aspect as described above.

A sixth aspect of the present disclosure provides a computer-readable medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the image retrieval method provided by the first aspect as described above.

A seventh aspect of the present disclosure provides a computer program comprising computer executable instructions for implementing the image retrieval method provided by the first aspect when executed.

An eighth aspect of the present disclosure provides a client, comprising: one or more processors, and a storage device. The storage device is used for storing one or more programs. Wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the image retrieval method provided by the second aspect as described above.

A ninth aspect of the present disclosure provides a computer-readable medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the image retrieval method provided by the second aspect as described above.

A tenth aspect of the present disclosure provides a computer program comprising computer executable instructions for implementing the image retrieval method provided by the second aspect when executed.

The image retrieval method applied to the server has the following beneficial effects:

the technical scheme provided by the embodiment of the disclosure can receive the content to be retrieved, process the content to be retrieved to obtain the eigenvector matrix of the content to be retrieved, encode the eigenvector matrix of the content to be retrieved by using the characteristic quantization model to obtain the binary code corresponding to the eigenvector matrix of the content to be retrieved, and then retrieve the candidate image matched with the binary code from the candidate image library based on the binary code corresponding to the eigenvector matrix of the content to be retrieved.

The image retrieval method applied to the client side has the following beneficial effects:

the technical proposal provided by the embodiment of the disclosure can send the content to be retrieved to the server, so that the server can process the content to be retrieved to obtain the characteristic vector matrix of the content to be retrieved, and the characteristic vector matrix of the content to be retrieved is encoded by utilizing the characteristic quantization model to obtain binary codes corresponding to the characteristic vector matrix of the content to be retrieved, candidate images matching the binary code are then retrieved from the candidate image library based on the binary code corresponding to the feature vector matrix of the content to be retrieved, in this way the feature vectors of the content to be retrieved can be converted into a compact binary code, thus, the image is searched from the candidate image based on the compact binary coding, so that the searching efficiency can be improved, and the searching speed required at the time can be met even if the number of searching users is excessive.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Fig. 1 is a schematic diagram showing an exemplary system architecture of an image retrieval method or an image retrieval apparatus to which an embodiment of the present invention can be applied;

FIG. 2 schematically shows a flow chart of an image retrieval method applied to a server according to an embodiment of the present disclosure;

FIG. 3 schematically shows a flow chart of an image retrieval method applied to a server according to another embodiment of the present disclosure;

FIG. 4 schematically shows a flow chart of an image retrieval method applied to a server according to another embodiment of the present disclosure;

FIG. 5 schematically shows a flow chart of an image retrieval method applied to a server according to another embodiment of the present disclosure;

FIG. 6 schematically shows a flow chart of an image retrieval method applied to a server according to another embodiment of the present disclosure;

FIG. 7 schematically shows a flow chart of an image retrieval method applied to a client according to an embodiment of the present disclosure;

fig. 8 schematically shows a schematic diagram of an application scenario of an image retrieval method according to an embodiment of the present disclosure;

fig. 9 schematically shows a block diagram of an image retrieval apparatus applied to a server according to an embodiment of the present disclosure;

fig. 10 schematically shows a block diagram of an image retrieval apparatus of an application client according to an embodiment of the present disclosure;

FIG. 11 schematically illustrates a block diagram of a computer system of a server according to an embodiment of the present disclosure;

fig. 12 schematically shows a block diagram of a computer system of a client according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase "a or B" should be understood to include the possibility of "a" or "B", or "a and B".

Fig. 1 is a schematic diagram showing an exemplary system architecture to which an image retrieval method or an image retrieval apparatus of an embodiment of the present invention can be applied.

As shown in fig. 1, the system architecture 100 may include one or more of

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, portable computers, desktop computers, and the like.

The server 105 may be a server that provides various services. For example, the server 105 may obtain the content to be retrieved from the terminal device 103 (or the terminal device 101 or 102), process the content to be retrieved to obtain a feature vector matrix of the content to be retrieved, then, the characteristic vector matrix of the content to be retrieved is coded by utilizing the characteristic quantization model to obtain binary codes corresponding to the characteristic vector matrix of the content to be retrieved, candidate images matching the binary code are then retrieved from the candidate image library based on the binary code corresponding to the feature vector matrix of the content to be retrieved, in this way the feature vectors of the content to be retrieved can be converted into a compact binary code, thus, the image is searched from the candidate image based on the compact binary coding, so that the searching efficiency can be improved, and the searching speed required at the time can be met even if the number of searching users is excessive.

In some embodiments, the image retrieval method provided by the embodiments of the present invention is generally executed by the server 105, and accordingly, the image retrieval apparatus is generally disposed in the server 105. In other embodiments, some terminals may have similar functionality as the server to perform the method. Therefore, the image retrieval method provided by the embodiment of the invention is not limited to be executed at the server side.

Fig. 2 schematically shows a flowchart of an image retrieval method applied to a server according to an embodiment of the present disclosure.

As shown in fig. 2, the image retrieval method applied to the server includes steps S110 to S150.

In step S110, the content to be retrieved is received.

In step S120, the content to be retrieved is processed to obtain a feature vector matrix of the content to be retrieved.

In step S130, the feature vector matrix of the content to be retrieved is encoded by using the feature quantization model, so as to obtain a binary code corresponding to the feature vector matrix of the content to be retrieved.

In step S140, a candidate image matching the binary code is retrieved from the candidate image library based on the binary code corresponding to the feature vector matrix of the content to be retrieved.

In step S150, the candidate image is sent to the client.

The method can receive the content to be retrieved, process the content to be retrieved to obtain a characteristic vector matrix of the content to be retrieved, encode the characteristic vector matrix of the content to be retrieved by using a characteristic quantization model to obtain a binary code corresponding to the characteristic vector matrix of the content to be retrieved, and then retrieve a candidate image matched with the binary code from a candidate image library based on the binary code corresponding to the characteristic vector matrix of the content to be retrieved.

In some embodiments of the present disclosure, the search content may be an image to be searched, which is uploaded by a user on a client, or may be a text to be searched, which is input by the user on the client.

In some embodiments of the present disclosure, the above feature quantizationThe model may include M codebooks, denoted C ═ C₁，C₂，…C_m、…C_MIn which each codebook C_mEach containing K code words, C_m＝{C_m1，C_m2，…C_mk，…C_mKAnd each codeword C_mkAnd a D-dimensional vector S ═ S obtained by k-means clustering algorithm₁，S₂，…S_i，…S_DD-dimensional vector S may refer to the feature vector matrix of the content to be retrieved. In addition, before quantizing the feature vector matrix of the content to be retrieved, a feature universal conversion algorithm can be used for processing the feature vector matrix, so that the difference and the noise among the feature vectors in the feature vector matrix of the content to be retrieved can be eliminated, and the accuracy in subsequent retrieval is improved.

In some embodiments of the present disclosure, based on the above-mentioned M codebooks, a vector b may be assigned to the binary codeword_nMapping into M index vectors, bn ═ b_1n，b_2n，…b_mn，…b_MnIn which each index vector b_mnAnd only one code word is used for approximately representing the nth data point in the characteristic vector matrix of the content to be retrieved, wherein the K code words in the mth code book are represented. In this example, the feature vector matrix of the content to be retrieved after being processed by the feature universality transformation algorithm can be represented as Z_nWherein Z is_nRepresenting the sum of M codewords (M codebooks, only one codeword being taken in each codebook), and then combining the binary codeword assignment vector b_nTo approximate Z_nI.e. Z_nConversion to compact binary coding, Z can be expressed by the following formula_nConversion to compact binary encoding:

wherein Z is_nIs a feature vector matrix of the content to be retrieved after being processed by a feature universality transformation algorithm, C_mIs the mth codebook, b_mnA vector is assigned to the binary codewords of the K codewords of the mth codebook.

In some embodiments of the present disclosure, the eigenvector matrix of the content to be retrieved may be converted into a compact binary code by using the above formula, for example, the binary code may be [0,1,0,0,0], and a candidate image corresponding to the binary code may be quickly retrieved from a candidate image library based on the binary code, thereby improving the retrieval efficiency.

Fig. 3 schematically shows a flowchart of an image retrieval method applied to a server according to another embodiment of the present disclosure.

As shown in fig. 3, when the content to be retrieved is an image to be retrieved, the image retrieval method applied to the server includes S210 to S250.

In step S210, an image to be retrieved is received.

In step S220, the image to be retrieved is processed by using a convolutional neural network, so as to obtain an image visual feature vector matrix of the image to be retrieved.

In step S230, the image visual feature vector matrix is encoded by using a feature quantization model, so as to obtain a binary code corresponding to the image visual feature vector matrix.

In step S240, candidate images matching the binary code are retrieved from the candidate image library based on the binary code corresponding to the image visual feature vector matrix

In step S250, the candidate image is transmitted to the client.

The method can receive an image to be retrieved, process the image to be retrieved to obtain an image visual characteristic vector matrix of the image to be retrieved, encode the image visual characteristic vector matrix of the image to be retrieved by using a characteristic quantization model to obtain a binary code corresponding to the image visual characteristic vector matrix of the image to be retrieved, and then retrieve a candidate image matched with the binary code from a candidate image library based on the binary code corresponding to the image visual characteristic vector matrix of the image to be retrieved.

In some embodiments of the present disclosure, the image to be retrieved is processed by using a convolutional neural network, so as to obtain an image visual feature vector matrix of the image to be retrieved. For example, for each image to be retrieved (for example, "sofa", "keyboard", "shoes" and "socks" images to be retrieved), feature extraction is performed through a convolutional neural network. The convolutional neural network can comprise a main convolutional layer, a pooling layer and a full-connection layer, and each input image to be retrieved is subjected to convolution operation, and then the output result of the input image to be retrieved is subjected to pooling processing. If the convolutional neural network is a deep convolutional neural network, many convolutional layers and pooling layers may be included in the deep convolutional neural network, and thus the operation of convolutional-pooling is performed cyclically a plurality of times. And extracting the image visual characteristic vector matrix of the image to be retrieved through processing of the full connection layer according to the result of the convolution-pooling processing for multiple times.

In some embodiments of the present disclosure, the above feature quantization model may include M codebooks, denoted as C ═ C₁，C₂，…C_m、…C_MIn which each codebook C_mEach containing K code words, C_m＝{C_m1，C_m2，…C_mk，…C_mKAnd each codeword C_mkAnd a D-dimensional vector S ═ S obtained by k-means clustering algorithm₁，S₂，…S_i，…S_DD-dimensional vector S may refer to the image visual feature vector matrix. In addition, before the image visual feature vector matrix is quantized, the image visual feature vector matrix can be processed by utilizing a feature universality conversion algorithm, so that the difference and the noise among feature vectors in the feature vector matrix of the content to be retrieved can be eliminated, and the accuracy in subsequent retrieval is improved.

In some embodiments of the present disclosureBased on the M codebooks, a vector b can be assigned to the binary codeword_nMapping into M index vectors, bn ═ b_1n，b_2n，…b_mn，…b_MnIn which each index vector b_mnThe expression is that among the K code words of the mth code book, only one code word is used for approximately expressing the nth data point in the image visual characteristic vector matrix. In this example, the image visual feature vector matrix processed by the feature universality transformation algorithm can be represented as Z_nWherein Z is_nRepresenting the sum of M codewords (M codebooks, only one codeword being taken in each codebook), and then combining the binary codeword assignment vector b_nTo approximate Z_nI.e. Z_nConversion to compact binary coding, Z can be expressed by the following formula_nConversion to compact binary encoding:

wherein Z is_nIs an image visual characteristic vector matrix processed by a characteristic universal conversion algorithm, C_mIs the mth codebook, b_mnA vector is assigned to the binary codewords of the K codewords of the mth codebook.

In some embodiments of the present disclosure, the image visual feature vector matrix may be converted into a compact binary code by using the above formula, for example, the binary code may be [0,1,0, 0], and a candidate image corresponding to the binary code may be quickly retrieved from a candidate image library based on the binary code, thereby improving retrieval efficiency and further enhancing retrieval performance.

Fig. 4 schematically shows a flowchart of an image retrieval method applied to a server according to another embodiment of the present disclosure.

As shown in fig. 4, when the content to be retrieved is a text to be retrieved, the image retrieval method applied to the server includes S310 to S350.

In step S310, a text to be retrieved is received.

In step S320, the word2Vec model is used to process the text to be retrieved, so as to obtain a semantic feature vector matrix of the text to be retrieved.

In step S330, the semantic feature vector matrix is encoded by using a feature quantization model, so as to obtain a binary code corresponding to the semantic feature vector matrix.

In step S340, a candidate image matching the binary code is retrieved from the candidate image library based on the binary code corresponding to the semantic feature vector matrix.

In step S350, the candidate image is transmitted to the client.

The method can receive a text to be retrieved, process the text to be retrieved to obtain a semantic feature vector matrix, encode the semantic feature vector matrix by using a feature quantization model to obtain a binary code corresponding to the semantic feature vector matrix, and then retrieve a candidate image matched with the binary code from a candidate image library based on the binary code corresponding to the semantic feature vector matrix.

In some embodiments of the present disclosure, the text to be retrieved may be words such as "mobile phone", "keyboard", "sofa", etc., and the Word2Vec model is used to vectorize the text words, which is convenient for the following calculation processing and matching. Specifically, the text to be retrieved may be in a vector form { x }₁,x₂,...,x_i,...,x_VIs input into the Word2Vec model, passes through the hidden layer { h }₁,...,h_j,...,h_NProcessing and processing, wherein the setting and the optimization of parameters such as weight are related, and a vector y is obtained and output through calculation₁,y₂,...，y_k,...,y_VI.e. the above semantic feature vector matrix, and then using a feature quantization model to convert { y }₁,y₂,...,y_k,...,y_VIt is converted into the binary code corresponding thereto.

In some embodiments of the present disclosure, the above feature quantization model may include M codebooks, denoted as C ═ C₁，C₂，…C_m、…C_MIn which each codebook C_mEach containing K code words, C_m＝{C_m1，C_m2，…C_mk，…C_mKAnd each codeword C_mkAnd a D-dimensional vector S ═ S obtained by k-means clustering algorithm₁，S₂，…S_i，…S_DD-dimensional vector S may refer to the above semantic feature vector matrix. In addition, before the semantic feature vector matrix is quantized, the semantic feature vector matrix can be processed by using a feature universality conversion algorithm, so that the difference and noise among feature vectors in the semantic feature vector matrix can be eliminated, and the accuracy in subsequent retrieval is improved.

In some embodiments of the present disclosure, based on the above-mentioned M codebooks, a vector b may be assigned to the binary codeword_nMapping into M index vectors, bn ═ b_1n，b_2n，…b_mn，…b_MnIn which each index vector b_mnThe representation is that among the K codewords in the mth codebook, only one codeword is used to approximately represent the nth data point in the above semantic feature vector matrix. In this example, the semantic feature vector matrix processed by the feature universality transformation algorithm can be represented as Z_nWherein Z is_nRepresenting the sum of M codewords (M codebooks, only one codeword being taken in each codebook), and then combining the binary codeword assignment vector b_nTo approximate Z_nI.e. Z_nConversion to compact binary coding, Z can be expressed by the following formula_nConversion to compact binary encoding:

wherein Z is_nIs a universal experience of the featureSemantic feature vector matrix after conversion algorithm processing, C_mIs the mth codebook, b_mnA vector is assigned to the binary codewords of the K codewords of the mth codebook.

In some embodiments of the present disclosure, the semantic feature vector matrix may be converted into a compact binary code by using the above formula, for example, the binary code may be [0,1,0, 0], and a candidate image corresponding to the binary code may be quickly retrieved from a candidate image library based on the binary code, thereby improving retrieval efficiency and further enhancing retrieval performance.

Fig. 5 schematically shows a flowchart of an image retrieval method applied to a server according to another embodiment of the present disclosure.

As shown in fig. 5, when the content to be retrieved is an image to be retrieved and a text to be retrieved, the image retrieval method applied to the server includes S410 to S460.

In step S410, the content to be retrieved is received as the image to be retrieved and the text to be retrieved.

In step S420, the image to be retrieved is processed by using a convolutional neural network to obtain an image visual feature vector matrix of the image to be retrieved, and the text to be retrieved is processed by using a word2Vec model to obtain a semantic feature vector matrix of the text to be retrieved.

In step S430, the image visual feature vector matrix and the semantic feature vector matrix are mapped to an image semantic space by using a full-connection conversion layer, and the image visual feature vector matrix and the semantic feature vector matrix are fused in the image semantic space to obtain an image semantic feature vector matrix.

In step S440, the image semantic feature vector matrix is encoded by using a feature quantization model, so as to obtain a binary code corresponding to the image semantic feature vector matrix.

In step S450, a candidate image matching the binary code is retrieved from the candidate image library based on the binary code corresponding to the image semantic feature vector matrix.

In step S460, the candidate image is transmitted to the client.

The method can fully combine the advantages of visual features of the images and deep text semantics to retrieve the images, and improves the retrieval accuracy. And moreover, the image semantic feature vector matrix is converted into binary coding, so that the retrieval performance is further enhanced, and a dual retrieval mode of searching images by images and searching images by text of a user is supported, so that the flexibility and universality of image retrieval are improved.

In some embodiments of the present disclosure, the text to be retrieved may be words such as "sofa", "keyboard", "shoes", "socks", etc., and the words of the text may be vectorized using the Word2Vec model, which facilitates the following calculation processing and matching. Specifically, the text to be retrieved may be in a vector form { x }₁,x₂,...,x_i,...,x_VIs input into the Word2Vec model, passes through the hidden layer { h }₁,...,h_j,...,h_NProcessing and processing, wherein the setting and the optimization of parameters such as weight are related, and a vector y is obtained and output through calculation₁,y₂,...,y_k,...,y_VI.e. the above-mentioned semantic feature vector matrix, and then use the featuresCharacterization quantization model will { y₁,y₂,...,y_k,...,y_VIt is converted into the binary code corresponding thereto.

In some embodiments of the present disclosure, the image visual feature vector matrix and the semantic feature vector matrix are mapped to the image semantic space by using the full-connection conversion layer, and the image visual feature vector matrix and the semantic feature vector matrix are fused in the image semantic space to obtain the image semantic feature vector matrix, so that the image visual feature and the semantic feature can be fused under the same standard, thereby reducing the difference between the two. Namely, the fusion can be carried out in the same semantic space, so that the image semantic feature vector matrix not only contains image visual features, but also contains semantic features.

In some embodiments of the present disclosure, the above feature quantization model may include M codebooks, denoted as C ═ C₁，C₂，…C_m、…C_MIn which each codebook C_mEach containing K code words, C_m＝{C_m1，C_m2，…C_mk，…C_mKAnd each codeword C_mkAnd a D-dimensional vector S ═ S obtained by k-means clustering algorithm₁，S₂，…S_i，…S_DD-dimensional vector S may refer to the image semantic feature vector matrix. In addition, before the image semantic feature vector matrix is quantized, the image semantic feature vector matrix can be processed by utilizing a feature universality conversion algorithm, so that the difference and the noise among feature vectors in the image semantic feature vector matrix can be eliminated, and the accuracy in subsequent retrieval is improved.

In some embodiments of the present disclosure, based on the above-mentioned M codebooks, a vector b may be assigned to the binary codeword_nMapping into M index vectors, bn ═ b_1n，b_2n，…b_mn，…b_MnIn which each index vector b_mnThe expression that there is one and only one code word in the K code words of the mth codebook is used to approximately express the nth data point in the semantic feature vector matrix of the image. In this example, the feature universalizationThe image semantic feature vector matrix after the processing of the sexual transformation algorithm can be represented as Z_nWherein Z is_nRepresenting the sum of M codewords (M codebooks, only one codeword being taken in each codebook), and then combining the binary codeword assignment vector b_nTo approximate Z_nI.e. Z_nConversion to compact binary coding, Z can be expressed by the following formula_nConversion to compact binary encoding:

wherein Z is_nIs an image semantic feature vector matrix processed by a feature universality conversion algorithm, C_mIs the mth codebook, b_mnA vector is assigned to the binary codewords of the K codewords of the mth codebook.

In some embodiments of the present disclosure, the image semantic feature vector matrix may be converted into a compact binary code by using the above formula, for example, the binary code may be [0,1,0, 0], and a candidate image corresponding to the binary code may be quickly retrieved from a candidate image library based on the binary code, so that the advantages of image visual features and deep text semantics are fully combined to retrieve the image, and the retrieval accuracy is improved. And moreover, the image semantic feature vector matrix is converted into binary coding, so that the retrieval performance is further enhanced, and a dual retrieval mode of searching images by images and searching images by text of a user is supported, so that the flexibility and universality of image retrieval are improved.

Fig. 6 schematically shows a flowchart of an image retrieval method applied to a server according to another embodiment of the present disclosure.

As shown in fig. 6, an image visual feature matrix of each image in the image library may be extracted by using a Convolutional Neural Network (CNN), a visual feature library corresponding to the image library may be obtained, and each image visual feature vector matrix in the visual feature library may be mapped to an image semantic space by using a fully connected translation layer. The semantic feature matrix of each text in the text library can be extracted by using a Word2Vec model, the text feature library corresponding to the text library can be obtained, each text feature vector matrix in the text feature library can be mapped to an image semantic space by using a full-connection conversion layer, then an image visual feature matrix corresponding to each image in the image library and a semantic feature vector matrix corresponding to each text in the text library can be fused in the semantic space, and the fused image semantic feature vector matrix is encoded by using a feature quantization model to obtain a binary coding library.

Referring to fig. 6, a user inputs an image to be retrieved and a text to be retrieved on a client, and sends the image to be retrieved and the text to be retrieved to a server, at this time, the image to be retrieved may be processed by using a convolutional neural network to obtain an image visual feature vector matrix of the image to be retrieved, the text to be retrieved is processed by using a word2Vec model to obtain a semantic feature vector matrix of the text to be retrieved, the image visual feature vector matrix and the semantic feature vector matrix are respectively mapped to an image semantic space by using a full-connection conversion layer, and the image visual feature vector matrix and the semantic feature vector matrix are fused in the image semantic space to obtain an image semantic feature vector matrix. And then, coding the image semantic feature vector matrix by using the feature quantization model to obtain a binary code corresponding to the image semantic feature vector matrix, matching the binary code corresponding to the image semantic feature vector matrix with the binary code library, and sending a matched candidate image to a client. And moreover, the image semantic feature vector matrix is converted into binary coding, so that the retrieval performance is further enhanced, and a dual retrieval mode of searching images by images and searching images by text of a user is supported, so that the flexibility and universality of image retrieval are improved.

Fig. 7 schematically shows a flowchart of an image retrieval method applied to a client according to an embodiment of the present disclosure.

As shown in fig. 7, the image retrieval method applied to the client includes step S510 and step S520.

In step S510, the content to be retrieved is sent to a server, so that the server processes the content to be retrieved to obtain a feature vector matrix of the content to be retrieved, encodes the feature vector matrix of the content to be retrieved by using a feature quantization model to obtain a binary code corresponding to the feature vector matrix of the content to be retrieved, and retrieves a candidate image matching the binary code from a candidate image library based on the binary code corresponding to the feature vector matrix of the content to be retrieved.

In step S520, candidate images matching the binary code are received.

The method can send the content to be retrieved to the server, so that the server can process the content to be retrieved to obtain a characteristic vector matrix of the content to be retrieved, encode the characteristic vector matrix of the content to be retrieved by using a characteristic quantization model to obtain a binary code corresponding to the characteristic vector matrix of the content to be retrieved, and then retrieve a candidate image matched with the binary code from a candidate image library based on the binary code corresponding to the characteristic vector matrix of the content to be retrieved.

In some embodiments of the present disclosure, the search content may be an image to be searched, which is uploaded by a user on a client, or may be a text to be searched, which is input by the user on the client. Of course, the user may upload both the image to be retrieved and the text to be retrieved at the client.

Fig. 8 schematically shows a schematic diagram of an application scenario of the image retrieval method according to an embodiment of the present disclosure.

As shown in fig. 8, a user gives a "shoe image" on a client and sends the "shoe image" to a server, and when the "shoe image" is received by the server, the "shoe image" is processed by using a convolutional neural network, so as to obtain an image visual feature vector matrix of the "shoe image". Then, the image visual characteristic vector matrix of the 'shoe image' is coded by using the characteristic quantization model to obtain a binary code corresponding to the image visual characteristic vector matrix of the 'shoe image', then a candidate image matched with the binary code is retrieved from a candidate image library based on the binary code corresponding to the image visual characteristic vector matrix of the 'shoe image', and finally the candidate image is sent to the client.

Referring to fig. 8, a user gives a text "white shoe" describing a shoe on a client and sends the text "white shoe" to a server, when the server receives the text "white shoe" describing a shoe, the text "white shoe" describing a shoe is processed by using a word2Vec model to obtain a semantic feature vector matrix of the text describing a shoe, then the semantic feature vector matrix of the text describing a shoe is encoded by using a feature quantization model to obtain a binary code corresponding to the semantic feature vector matrix of the text describing a shoe, then a candidate image matching the binary code is retrieved from a candidate image library based on the binary code corresponding to the semantic feature vector matrix of the text describing a shoe, and finally the candidate image is sent to the client, so that retrieving an image from the candidate image based on a compact binary code can improve the retrieval efficiency, even if the number of searching users is excessive, the required searching speed can be met.

Referring to fig. 8, the user gives "shoe image" and text describing shoes "white shoes" on the client, and sends it to the server, when the server receives the 'shoe image' and the text 'white shoe' describing the shoe, the 'shoe image' is processed by using the convolutional neural network to obtain an image visual characteristic vector matrix of the 'shoe image', processing a text 'white shoe' describing the shoe by using a word2Vec model to obtain a semantic feature vector matrix describing the text of the shoe, mapping an image visual feature vector matrix of a 'shoe image' and the semantic feature vector matrix describing the text of the shoe to an image semantic space by using a full-connection conversion layer, and fusing the image visual characteristic vector matrix of the shoe image and the semantic characteristic vector matrix of the text describing the shoes in the image semantic space to obtain the image semantic characteristic vector matrix. And then, coding the image semantic feature vector matrix by using the feature quantization model to obtain a binary code corresponding to the image semantic feature vector matrix, searching a candidate image matched with the binary code from a candidate image library based on the binary code corresponding to the image semantic feature vector matrix, and finally sending the candidate image to a client. And moreover, the image semantic feature vector matrix is converted into binary coding, so that the retrieval performance is further enhanced, and a dual retrieval mode of searching images by images and searching images by text of a user is supported, so that the flexibility and universality of image retrieval are improved.

It should be noted that, if a user only gives a content to be retrieved (for example, an image to be retrieved or a text to be retrieved), it is not necessary to map its feature vector matrix to an image semantic space using a full connection translation layer, because only when the content to be retrieved includes the text to be retrieved of the image to be retrieved, it is necessary to fuse the feature vector matrices of the two in the image semantic space, so that the fused feature vector matrix includes both the features of the image and the semantic features of the text.

Fig. 9 schematically shows a block diagram of an image retrieval apparatus applied to a server according to an embodiment of the present disclosure.

As shown in fig. 9, the image retrieval apparatus 600 applied to the server includes a receiving module 610, a processing module 620, a quantizing module 630, a retrieving module 640, and a transmitting module 650.

Specifically, the receiving module 610 is configured to receive content to be retrieved.

And the processing module 620 is configured to process the content to be retrieved to obtain a feature vector matrix of the content to be retrieved.

The quantization module 630 encodes the feature vector matrix of the content to be retrieved by using a feature quantization model to obtain a binary code corresponding to the feature vector matrix of the content to be retrieved.

And the retrieval module 640 retrieves a candidate image matched with the binary code from the candidate image library based on the binary code corresponding to the feature vector matrix of the content to be retrieved.

A sending module 650, configured to send the candidate image to the client.

The image retrieval device 600 applied to the server can receive the content to be retrieved, process the content to be retrieved to obtain the eigenvector matrix of the content to be retrieved, encode the eigenvector matrix of the content to be retrieved by using the characteristic quantization model to obtain the binary code corresponding to the eigenvector matrix of the content to be retrieved, and then retrieve the candidate image matched with the binary code from the candidate image library based on the binary code corresponding to the eigenvector matrix of the content to be retrieved.

According to the embodiment of the present disclosure, the image retrieval apparatus 600 applied to the server is used for implementing the image retrieval method applied to the server described in the embodiment of fig. 2.

In some embodiments of the disclosure, when the content to be retrieved is an image to be retrieved, the processing module 620 is configured to: and processing the image to be retrieved by utilizing a convolutional neural network to obtain an image visual characteristic vector matrix of the image to be retrieved. The quantization module 630 is configured to: and coding the image visual characteristic vector matrix by using a characteristic quantization model to obtain a binary code corresponding to the image visual characteristic vector matrix. The retrieving module 640 is configured to: and retrieving candidate images matched with the binary codes from the candidate image library based on the binary codes corresponding to the image visual characteristic vector matrixes.

In some embodiments of the disclosure, when the content to be retrieved is a text to be retrieved, the processing module 620 is configured to: and processing the text to be retrieved by using a word2Vec model to obtain a semantic feature vector matrix of the text to be retrieved. The quantization module 630 is configured to: and coding the semantic feature vector matrix by using a feature quantization model to obtain a binary code corresponding to the semantic feature vector matrix. The retrieving module 640 is configured to: and retrieving candidate images matched with the binary codes from the candidate image library based on the binary codes corresponding to the semantic feature vector matrixes.

In some embodiments of the disclosure, when the content to be retrieved is an image to be retrieved and a text to be retrieved, the processing module 620 is configured to: processing the image to be retrieved by using a convolutional neural network to obtain an image visual characteristic vector matrix of the image to be retrieved, and processing the text to be retrieved by using a word2Vec model to obtain a semantic characteristic vector matrix of the text to be retrieved.

According to an embodiment of the present disclosure, the image retrieval apparatus 600 applied to the server further includes: and the fusion module is used for mapping the image visual characteristic vector matrix and the semantic characteristic vector matrix to an image semantic space by using a full-connection conversion layer, and fusing the image visual characteristic vector matrix and the semantic characteristic vector matrix in the image semantic space to obtain an image semantic characteristic vector matrix.

Based on the foregoing scheme, the quantization module 630 is configured to: and coding the image semantic feature vector matrix by using a feature quantization model to obtain a binary code corresponding to the image semantic feature vector matrix. The retrieving module 640 is configured to: and retrieving candidate images matched with the binary codes from the candidate image library based on the binary codes corresponding to the image semantic feature vector matrixes.

Fig. 10 schematically shows a block diagram of an image retrieval apparatus applied to a client according to an embodiment of the present disclosure.

As shown in fig. 10, the image retrieval apparatus 700 applied to the client includes a transmitting module 710 and a receiving module 720.

Specifically, the sending module 710 is configured to send the content to be retrieved to a server, so that the server processes the content to be retrieved to obtain a feature vector matrix of the content to be retrieved, encode the feature vector matrix of the content to be retrieved by using a feature quantization model to obtain a binary code corresponding to the feature vector matrix of the content to be retrieved, and retrieve, from a candidate image library, a candidate image matching the binary code based on the binary code corresponding to the feature vector matrix of the content to be retrieved.

A receiving module 720, configured to receive the candidate image matching the binary code.

The image retrieval apparatus 700 applied to the client can transmit the contents to be retrieved to the server, so that the server can process the content to be retrieved to obtain the characteristic vector matrix of the content to be retrieved, and the characteristic vector matrix of the content to be retrieved is encoded by utilizing the characteristic quantization model to obtain binary codes corresponding to the characteristic vector matrix of the content to be retrieved, candidate images matching the binary code are then retrieved from the candidate image library based on the binary code corresponding to the feature vector matrix of the content to be retrieved, in this way the feature vectors of the content to be retrieved can be converted into a compact binary code, thus, the image is searched from the candidate image based on the compact binary coding, so that the searching efficiency can be improved, and the searching speed required at the time can be met even if the number of searching users is excessive.

According to the embodiment of the present disclosure, the image retrieval apparatus 700 applied to the client is used to implement the image retrieval method applied to the client described in the embodiment of fig. 7.

FIG. 11 schematically shows a block diagram of a computer system of a server according to an embodiment of the disclosure. The computer system illustrated in FIG. 11 is only one example and should not impose any limitations on the scope of use or functionality of embodiments of the disclosure.

As shown in fig. 11, a computer system 800 of a server according to an embodiment of the present disclosure includes a processor 801 which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 801 may also include onboard memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing the different actions of the method flows described with reference to fig. 2-5 in accordance with embodiments of the present disclosure.

In the RAM 803, various programs and data necessary for the operation of the system 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 executes the various steps of the image retrieval method applied to the server described above with reference to fig. 2 to 5 by executing the programs in the ROM 802 and/or RAM 803. Note that the program may also be stored in one or more memories other than the ROM 802 and the RAM 803. The processor 801 may also perform the various steps of the image retrieval method applied to the server described above with reference to fig. 2-5 by executing the program stored in the one or more memories.

According to an embodiment of the present disclosure, the system 800 may also include an input/output (I/O) interface 807, the input/output (I/O) interface 807 also being connected to the bus 804. The system 800 may also include one or more of the following components connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

According to an embodiment of the present disclosure, the method described above with reference to the flow chart may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program, when executed by the processor 801, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing. According to embodiments of the present disclosure, a computer-readable medium may include one or more memories other than the ROM 802 and/or the RAM 803 and/or the ROM 802 and the RAM 803 described above.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As another aspect, the present disclosure also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The above-mentioned computer-readable medium carries one or more programs which, when executed by one of the apparatuses, cause the apparatus to execute the image retrieval method applied to the server according to the embodiment of the present disclosure. The method comprises the following steps: receiving content to be retrieved; processing the content to be retrieved to obtain a characteristic vector matrix of the content to be retrieved; coding the characteristic vector matrix of the content to be retrieved by utilizing a characteristic quantization model to obtain a binary code corresponding to the characteristic vector matrix of the content to be retrieved; retrieving a candidate image matched with the binary code from a candidate image library based on the binary code corresponding to the characteristic vector matrix of the content to be retrieved; and sending the candidate image to a client.

Fig. 12 schematically shows a block diagram of a computer system of a client according to an embodiment of the present disclosure. The computer system illustrated in FIG. 12 is only one example and should not impose any limitations on the scope of use or functionality of embodiments of the disclosure.

As shown in fig. 12, the computer system 900 of the client according to the embodiment of the present disclosure includes a processor 901, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. Processor 901 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure described with reference to fig. 7.

In the RAM 903, various programs and data necessary for the operation of the system 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. The processor 901 performs various steps of the image retrieval method applied to the client described above with reference to fig. 7 by executing programs in the ROM 902 and/or the RAM 903. Note that the program may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various steps of the image retrieval method applied to the client described above with reference to fig. 7 by executing the program stored in the one or more memories.

System 900 may also include an input/output (I/O) interface 907, which is also connected to bus 904, according to embodiments of the present disclosure. The system 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

According to an embodiment of the present disclosure, the method described above with reference to the flow chart may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing. According to embodiments of the present disclosure, a computer-readable medium may include the ROM 902 and/or the RAM 903 described above and/or one or more memories other than the ROM 902 and the RAM 903.

As another aspect, the present disclosure also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The above-mentioned computer-readable medium carries one or more programs which, when executed by one of the apparatuses, cause the apparatus to execute an image retrieval method applied to a client according to an embodiment of the present disclosure. The method comprises the following steps: sending contents to be retrieved to a server so that the server processes the contents to be retrieved to obtain a characteristic vector matrix of the contents to be retrieved, coding the characteristic vector matrix of the contents to be retrieved by using a characteristic quantization model to obtain a binary code corresponding to the characteristic vector matrix of the contents to be retrieved, and retrieving a candidate image matched with the binary code from a candidate image library based on the binary code corresponding to the characteristic vector matrix of the contents to be retrieved; candidate images matching the binary code are received.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. An image retrieval method is applied to a server and comprises the following steps:

receiving content to be retrieved;

processing the content to be retrieved to obtain a characteristic vector matrix of the content to be retrieved;

coding the characteristic vector matrix of the content to be retrieved by utilizing a characteristic quantization model to obtain a binary code corresponding to the characteristic vector matrix of the content to be retrieved;

retrieving a candidate image matched with the binary code from a candidate image library based on the binary code corresponding to the characteristic vector matrix of the content to be retrieved;

and sending the candidate image to a client.

2. The method of claim 1, wherein the content to be retrieved includes an image to be retrieved, and processing the content to be retrieved to obtain a feature vector matrix of the content to be retrieved includes:

and processing the image to be retrieved by utilizing a convolutional neural network to obtain an image visual characteristic vector matrix of the image to be retrieved.

3. The method of claim 2, wherein encoding the eigenvector matrix of the content to be retrieved using the eigen quantization model to obtain the binary encoding corresponding to the eigenvector matrix of the content to be retrieved comprises:

and coding the image visual characteristic vector matrix by using a characteristic quantization model to obtain a binary code corresponding to the image visual characteristic vector matrix.

4. The method of claim 3, wherein retrieving the candidate image matching the binary code from the library of candidate images based on the binary code corresponding to the eigenvector matrix of the content to be retrieved comprises:

and retrieving candidate images matched with the binary codes from the candidate image library based on the binary codes corresponding to the image visual characteristic vector matrixes.

5. The method of claim 1, wherein the content to be retrieved comprises a text to be retrieved, and processing the content to be retrieved to obtain a feature vector matrix of the content to be retrieved comprises:

and processing the text to be retrieved by using a word2Vec model to obtain a semantic feature vector matrix of the text to be retrieved.

6. The method of claim 5, wherein encoding the eigenvector matrix of the content to be retrieved using the eigen quantization model to obtain the binary encoding corresponding to the eigenvector matrix of the content to be retrieved comprises:

and coding the semantic feature vector matrix by using a feature quantization model to obtain a binary code corresponding to the semantic feature vector matrix.

7. The method of claim 6, wherein retrieving a candidate image from a library of candidate images that matches a binary corresponding to the eigenvector matrix of the content to be retrieved based on the binary comprises:

and retrieving candidate images matched with the binary codes from the candidate image library based on the binary codes corresponding to the semantic feature vector matrixes.

8. The method of claim 1, wherein the content to be retrieved includes an image to be retrieved and a text to be retrieved, and processing the content to be retrieved to obtain the feature vector matrix of the content to be retrieved includes:

processing the image to be retrieved by using a convolutional neural network to obtain an image visual characteristic vector matrix of the image to be retrieved, and processing the text to be retrieved by using a word2Vec model to obtain a semantic characteristic vector matrix of the text to be retrieved.

9. The method of claim 8, wherein after the word2Vec model is used to process the text to be retrieved to obtain a semantic feature vector matrix of the text to be retrieved, the method further comprises:

and mapping the image visual characteristic vector matrix and the semantic characteristic vector matrix to an image semantic space by using a full-connection conversion layer, and fusing the image visual characteristic vector matrix and the semantic characteristic vector matrix in the image semantic space to obtain an image semantic characteristic vector matrix.

10. The method of claim 8, wherein encoding the eigenvector matrix of the content to be retrieved using the eigen quantization model to obtain the binary encoding corresponding to the eigenvector matrix of the content to be retrieved comprises:

and coding the image semantic feature vector matrix by using a feature quantization model to obtain a binary code corresponding to the image semantic feature vector matrix.

11. The method of claim 10, wherein retrieving a candidate image from a library of candidate images that matches a binary code based on the binary code corresponding to the eigenvector matrix of the content to be retrieved comprises:

and retrieving candidate images matched with the binary codes from the candidate image library based on the binary codes corresponding to the image semantic feature vector matrixes.

12. An image retrieval method is applied to a client, and comprises the following steps:

sending contents to be retrieved to a server so that the server processes the contents to be retrieved to obtain a characteristic vector matrix of the contents to be retrieved, coding the characteristic vector matrix of the contents to be retrieved by using a characteristic quantization model to obtain a binary code corresponding to the characteristic vector matrix of the contents to be retrieved, and retrieving a candidate image matched with the binary code from a candidate image library based on the binary code corresponding to the characteristic vector matrix of the contents to be retrieved;

candidate images matching the binary code are received.

13. The method of claim 12, wherein the content to be retrieved comprises an image to be retrieved and/or text to be retrieved.

14. An image retrieval apparatus applied to a server, the apparatus comprising:

the receiving module is used for receiving the content to be retrieved;

the processing module is used for processing the content to be retrieved to obtain a characteristic vector matrix of the content to be retrieved;

the quantization module is used for coding the characteristic vector matrix of the content to be retrieved by utilizing a characteristic quantization model to obtain a binary code corresponding to the characteristic vector matrix of the content to be retrieved;

the retrieval module is used for retrieving a candidate image matched with the binary code from a candidate image library based on the binary code corresponding to the characteristic vector matrix of the content to be retrieved;

and the sending module is used for sending the candidate images to the client.

15. An image retrieval device applied to a client, the device comprising:

the system comprises a sending module, a searching module and a searching module, wherein the sending module is used for sending contents to be searched to a server so that the server processes the contents to be searched to obtain a characteristic vector matrix of the contents to be searched, the characteristic vector matrix of the contents to be searched is coded by using a characteristic quantization model to obtain a binary code corresponding to the characteristic vector matrix of the contents to be searched, and a candidate image matched with the binary code is searched from a candidate image library based on the binary code corresponding to the characteristic vector matrix of the contents to be searched;

and the receiving module is used for receiving the candidate image matched with the binary code.

16. A server, comprising:

one or more processors; and

storage means for storing one or more programs;

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-11.

17. A computer readable medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 11.

18. A client, comprising:

one or more processors; and

storage means for storing one or more programs;

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 12-13.

19. A computer readable medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 12 to 13.