WO2022241987A1

WO2022241987A1 - Image retrieval method and apparatus

Info

Publication number: WO2022241987A1
Application number: PCT/CN2021/119402
Authority: WO
Inventors: 曾锐; 林汉权; 林杰兴
Original assignee: 稿定（厦门）科技有限公司
Priority date: 2021-05-18
Filing date: 2021-09-18
Publication date: 2022-11-24
Also published as: CN113282781A; CN113282781B

Abstract

Disclosed in the present disclosure are an image retrieval method and apparatus, and a medium and a device. The image retrieval method comprises: acquiring historical images, performing saliency detection on the historical images, and performing semantic extraction on the historical images according to saliency detection results, so as to obtain semantic features of the historical images; calculating text features corresponding to the historical images; inputting the historical images into a style recognition model, so as to obtain style features of the historical images; according to the semantic features, the text features and the style features, calculating retrieval vectors corresponding to the historical images, and generating a retrieval database; acquiring an image to be subjected to retrieval, calculating a vector to be subjected to retrieval corresponding to said image, and according to said vector and a retrieval vector, calculating a similarity value between any historical image in the retrieval database and said image; and according to the similarity values corresponding to all the historical images, returning a retrieval result corresponding to said image. Feature information of images can be extracted from a plurality of dimensions, and potential information of an original image can be deeply extracted, thereby improving the accuracy of image retrieval.

Description

Image retrieval method and device

technical field

The present invention relates to the technical field of image retrieval, in particular to an image retrieval method, a computer-readable storage medium, a computer device and an image retrieval device.

Background technique

Search by image is a function of image retrieval based on the specified image provided by the user to obtain the target image; this function does not require the user to organize keywords and analyze the retrieval method; it can effectively improve the user's retrieval efficiency and reduce the user's time spent on retrieval. The time spent in the target image process.

In related technologies, in the process of image retrieval according to the image specified by the user, the entire image is mostly input into the model to extract the features of the entire image; then, the target image is retrieved based on the features of the entire image. This method tends to ignore the important information of the specified image, resulting in inaccurate retrieval results of the final target image.

Contents of the invention

The present invention aims to solve one of the technical problems in the above-mentioned technologies at least to a certain extent. Therefore, an object of the present invention is to propose an image retrieval method, which can extract feature information of images from multiple dimensions, deeply mine potential information of original images, and further improve the accuracy of image retrieval.

A second object of the present invention is to propose a computer-readable storage medium.

A third object of the present invention is to propose a computer device.

The fourth object of the present invention is to provide an image retrieval device.

In order to achieve the above purpose, the embodiment of the first aspect of the present invention proposes an image retrieval method, including the following steps: acquiring historical images, and performing saliency detection on the historical images through a pre-trained saliency detection network, and according to The significance detection result performs semantic extraction on the historical image to obtain the semantic features of the historical image; performs text extraction on the historical image, and calculates the text feature corresponding to the historical image according to the text extraction result; The historical image is input to the style recognition model to obtain the style feature of the historical image; the retrieval vector corresponding to the historical image is calculated according to the semantic feature, the copy feature and the style feature, and according to a plurality of the historical The image and the retrieval vector corresponding to each historical image generate a retrieval database; obtain the image to be retrieved, and calculate the retrieval vector corresponding to the image to be retrieved, and calculate the retrieval database according to the retrieval vector and the retrieval vector The similarity value between any historical image and the image to be retrieved; return the retrieval result corresponding to the image to be retrieved according to the similarity values corresponding to all historical images.

According to the image retrieval method of the embodiment of the present invention, first, obtain historical images, and perform saliency detection on the historical images through a pre-trained saliency detection network, so as to extract the main part in the historical images; then, according to the saliency detection results Semantic analysis is performed on historical images to obtain the semantic features of historical images; then, copywriting is extracted from historical images, and the corresponding copywriting features of historical images are calculated according to the results of copywriting extraction; then, historical images are input into the style recognition model to The style features of historical images are extracted through the style recognition model; then, the semantic features, copy features and style features are fused to obtain a retrieval vector; and the historical images and corresponding retrieval vectors are added to the retrieval database to pass multiple The historical images and their corresponding retrieval vectors are used to generate a retrieval database; then, the images to be retrieved are obtained, and the retrieval vectors corresponding to the images to be retrieved are calculated, and according to the retrieval vectors corresponding to the retrieval vectors and any one of the historical images, the relationship between the image to be retrieved and the retrieval vector is calculated. The similarity values between the historical images; then, according to the similarity values corresponding to all historical images, return the retrieval results corresponding to the images to be retrieved; thereby extracting the feature information of the image from multiple dimensions, deeply mining the potential information of the original image, and then improving Accuracy of Image Retrieval.

In addition, the image retrieval method proposed according to the above-mentioned embodiments of the present invention may also have the following additional technical features:

Optionally, the training of the saliency detection network includes: acquiring an open-source dataset and a subject-free image, extracting subject information of images in the open-source dataset, and fusing the subject information with the subject-free image; A training set is generated according to the fusion result of the open source data set and the subject information and the subject-free image, so as to train the saliency detection network according to the training set.

Optionally, calculating the copy feature corresponding to the historical image according to the copy extraction result includes: performing word segmentation and keyword extraction on the copy extraction result to generate keywords corresponding to the copy extraction result and weights corresponding to the keyword; The keywords are mapped to keyword vectors, and a weighted average is performed according to the keyword vectors and corresponding weights to obtain copy features corresponding to the historical images.

Optionally, calculating the retrieval vector corresponding to the historical image according to the semantic feature, the copy feature and the style feature includes: obtaining the weight corresponding to the semantic feature, the weight corresponding to the copy feature, and the The weight corresponding to the style feature, and performing feature fusion on the semantic feature, the copy feature and the style feature according to the weight corresponding to the semantic feature, the weight corresponding to the copy feature, and the weight corresponding to the style feature, to get the retrieval vector.

Optionally, the method further includes: obtaining click data of the user on the search result, and updating the weight corresponding to the semantic feature, the weight corresponding to the copywriting feature, and the weight corresponding to the style feature according to the click data .

In order to achieve the above purpose, the embodiment of the second aspect of the present invention provides a computer-readable storage medium on which an image retrieval program is stored, and when the image retrieval program is executed by a processor, the above image retrieval method is realized.

According to the computer-readable storage medium of the embodiment of the present invention, by storing the image retrieval program, so that when the processor executes the image retrieval program, the above-mentioned image retrieval method is realized, thereby realizing the feature information extraction of the image from multiple dimensions, Deeply mine the potential information of the original image, and then improve the accuracy of image retrieval.

In order to achieve the above object, the embodiment of the third aspect of the present invention proposes a computer device, including a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the program, Implement the image retrieval method as described above.

According to the computer device of the embodiment of the present invention, the image retrieval program is stored through the memory, so that when the processor executes the image retrieval program, the above-mentioned image retrieval method is realized, thereby realizing the feature information extraction of the image from multiple dimensions, Deeply mine the potential information of the original image, and then improve the accuracy of image retrieval.

In order to achieve the above-mentioned purpose, the embodiment of the fourth aspect of the present invention proposes an image retrieval device, including: a semantic feature module, the semantic feature module is used to obtain historical images, and the pre-trained saliency detection network detects the Performing saliency detection on historical images, and performing semantic extraction on the historical images according to the saliency detection results to obtain semantic features of the historical images; a copy feature module, the copy feature module is used to copy the historical images Extracting, and calculating the copywriting feature corresponding to the historical image according to the copywriting extraction result; a style feature module, the style feature module is used to input the historical image into a style recognition model to obtain the style feature of the historical image; database module, the database module is used to calculate the retrieval vector corresponding to the historical image according to the semantic feature, the copy feature and the style feature, and according to the multiple historical images and the retrieval vector corresponding to each historical image Generate a retrieval database; a retrieval module, the retrieval module is used to obtain images to be retrieved, and calculate the vector to be retrieved corresponding to the image to be retrieved, and calculate any vector in the retrieval database according to the vector to be retrieved and the retrieval vector A similarity value between a historical image and the image to be retrieved; a feedback module, the feedback module is used to return the retrieval result corresponding to the image to be retrieved according to the similarity values corresponding to all historical images.

According to the image retrieval device of the embodiment of the present invention, the semantic feature module is used to obtain historical images, and the pre-trained saliency detection network is used to detect the saliency of the historical images, and the semantics of the historical images is performed according to the saliency detection results. Extract to obtain the semantic features of the historical image; the copy feature module is used to extract the copy of the historical image, and calculate the copy feature corresponding to the historical image according to the copy extraction result; the style feature module is used to input the historical image to the style recognition model to Get the style features of historical images; the database module is used to calculate the retrieval vectors corresponding to the historical images according to the semantic features, copy features and style features, and generate a retrieval database according to multiple historical images and the retrieval vectors corresponding to each historical image; the retrieval module uses To obtain the image to be retrieved, calculate the vector to be retrieved corresponding to the image to be retrieved, and calculate the similarity value between any historical image in the retrieval database and the image to be retrieved according to the vector to be retrieved and the retrieval vector; the feedback module is used to The similarity value corresponding to the image returns the retrieval result corresponding to the image to be retrieved; thus, the feature information of the image can be extracted from multiple dimensions, and the potential information of the original image can be deeply mined, thereby improving the accuracy of image retrieval.

In addition, the image retrieval device proposed according to the above-mentioned embodiments of the present invention may also have the following additional technical features:

Description of drawings

Fig. 1 is a schematic flow chart of an image retrieval method according to an embodiment of the present invention;

Fig. 2 is a schematic block diagram of an image retrieval device according to an embodiment of the present invention.

Detailed ways

Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

In related technologies, in the process of image retrieval according to the image specified by the user, the entire image is mostly input into the model to extract the features of the entire image; then, the target image is retrieved based on the features of the entire image. This way is easy to ignore the important information of the specified image, resulting in inaccurate retrieval results of the final target image; according to the image retrieval method of the embodiment of the present invention, firstly, the historical image is obtained, and the historical image is processed through the pre-trained saliency detection network. Saliency detection to extract the main part of the historical image; then, carry out semantic analysis on the historical image according to the saliency detection result to obtain the semantic features of the historical image; then, perform copy extraction on the historical image, and calculate according to the copy extraction result The copy features corresponding to the historical image; then, input the historical image into the style recognition model to extract the style features of the historical image through the style recognition model; then, perform feature fusion on the semantic feature, copy feature and style feature to obtain the retrieval vector; and adding the historical image and the corresponding retrieval vector to the retrieval database, so as to generate the retrieval database through multiple historical images and their corresponding retrieval vectors; then, obtain the image to be retrieved, and calculate the retrieval vector corresponding to the image to be retrieved, And calculate the similarity value between the image to be retrieved and the historical image according to the retrieval vector corresponding to the vector to be retrieved and any historical image; then, return the retrieval result corresponding to the image to be retrieved according to the similarity values corresponding to all historical images; thereby realizing Extract the feature information of the image from multiple dimensions, deeply mine the potential information of the original image, and then improve the accuracy of image retrieval.

In order to better understand the above technical solutions, the following will describe exemplary embodiments of the present invention in more detail with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present invention and to fully convey the scope of the present invention to those skilled in the art.

In order to better understand the above-mentioned technical solution, the above-mentioned technical solution will be described in detail below in conjunction with the accompanying drawings and specific implementation methods.

Fig. 1 is a schematic flow chart of an image retrieval method according to an embodiment of the present invention. As shown in Fig. 1, the image retrieval method includes the following steps:

S101. Acquire historical images, perform saliency detection on the historical images through a pre-trained saliency detection network, and perform semantic extraction on the historical images according to the saliency detection results, so as to obtain semantic features of the historical images.

That is to say, obtain the historical images used for training, and perform saliency detection on the historical images through the pre-trained saliency detection network to obtain the main body area in the historical image; then, if there is a main body area in the historical image, then the Semantic extraction is performed on the subject area; if there is no subject area in the historical image, semantic extraction is performed on the entire historical image to obtain the semantic features of the historical image. It can be understood that if the image is a commodity image, there will often be a prominent subject in the commodity image, and the position and color of the subject will attract the user's attention; and if it is a poster image, there will be a prominent subject in the poster image. There are many small elements distributed in the poster; by extracting the salient main body area first, the subsequent retrieval accuracy of the target image can be effectively improved.

In some embodiments, the training of the saliency detection network includes: obtaining an open source dataset and subject-free images, extracting subject information of images in the open source dataset, and fusing subject information with subject-free images; according to the open source dataset and subject The fusion result of the information and the subject-free image generates a training set, so that the training of the saliency detection network can be performed according to the training set.

It can be understood that most of the images in the open source data set are pictures in natural scenes, and there will be deviations from the images in specific application scenarios. For example, in a poster scene, there will be a large number of text boxes and small elements in the picture. However, if the training set is generated by manual marking, it will consume a lot of manpower and material resources; therefore, when training the saliency detection network; first, by extracting the subject information corresponding to the image in the open source dataset, and using the The information is fused with the non-subject image to generate a new image; in this way, a large number of training samples can be obtained without manual labeling; the resources required for the training process of the saliency detection network are reduced.

S102. Perform text extraction on historical images, and calculate text features corresponding to the historical images according to text extraction results.

That is to say, firstly, text detection and recognition is performed on the historical image to identify the text part in the historical image to complete text extraction; then, the text feature corresponding to the historical image is calculated according to the text extraction result.

Among them, there may be multiple ways to calculate the copy feature corresponding to the historical image according to the copy extraction result.

In some embodiments, calculating the copy feature corresponding to the historical image according to the copy extraction result includes: performing word segmentation and keyword extraction on the copy extraction result to generate keywords corresponding to the copy extraction result and weights corresponding to the keyword; It is mapped to a keyword vector, and a weighted average is performed according to the keyword vector and the corresponding weight to obtain the copy features corresponding to the historical image.

As an example, firstly, crawlers and other technologies are used to search the public copywriting on the Internet, so as to generate a training data set according to the collected data; then, the word2vector model and word segmentation model are trained according to the training data set; then, the history The text detection and recognition of the image is used to extract the text part in the historical image; then, the text part is segmented and the keywords are extracted through the word segmentation model to obtain the corresponding keywords and the corresponding weight of each keyword; then, through word2vector Each keyword is mapped to a corresponding keyword vector; then, weighted summation is performed according to the keyword vector and weight corresponding to the keyword to obtain the copy feature vector corresponding to the historical image.

S103. Input the historical image into the style recognition model to obtain the style features of the historical image.

That is to say, the style recognition of historical images is carried out through the pre-trained style recognition model (it is understandable that each image will have its corresponding style; for example, most of the Spring Festival posters will use red as the main color to highlight the festive atmosphere); to obtain the style features of historical images; it can be understood that this style recognition will effectively improve the accuracy of subsequent image retrieval.

As an example, the training of the style recognition model may include: first, obtaining the result image corresponding to the image template (that is, the image generated by the image template), so as to use the result image corresponding to the same image template as an image of the same style; In this way, a large amount of effective training data can be obtained. Further, the dominant color of each result image in the same style can be extracted, and the color distance of the dominant color between the result images can be calculated to filter out the result images that obviously do not belong to the same style, and determine the final training data.

As another example, ResNet50 combined with triplet loss can be used to train a style recognition model.

S104. Calculate retrieval vectors corresponding to the historical images according to the semantic features, copywriting features and style features, and generate a retrieval database according to multiple historical images and the retrieval vectors corresponding to each historical image.

That is to say, the calculation of the retrieval vector corresponding to the historical image is performed according to the semantic features, copy features and style features; furthermore, after the calculation is completed, the historical image and the corresponding retrieval vector are added to the retrieval database; thus, based on multiple historical images The retrieval vector corresponding to each historical image can construct a retrieval database, so that subsequent image retrieval can be performed according to the retrieval database.

In some embodiments, calculating the retrieval vector corresponding to the historical image according to the semantic feature, the copy feature and the style feature includes: obtaining the weight corresponding to the semantic feature, the weight corresponding to the copy feature and the weight corresponding to the style feature, and according to the corresponding weight of the semantic feature Weights, weights corresponding to copywriting features, and weights corresponding to style features perform feature fusion on semantic features, copywriting features, and style features to obtain retrieval vectors.

As an example, semantic features, copywriting features, and style features are all one-dimensional vectors with a length of 128, which are verctor1, vecotr2, and vector3; then, define the weights corresponding to the three features as a1, a2, and a3; then finally The retrieval vector of is expressed as: a1*vector1+a2*vector2+a3*vector3.

In some embodiments, the image retrieval method proposed by the embodiment of the present invention further includes: acquiring the user's click data on the retrieval results, and performing the weight corresponding to the semantic feature, the weight corresponding to the copy feature, and the weight corresponding to the style feature according to the click data. renew.

It can be understood that when initially calculating the retrieval vector, the initial weight (for example, 1, 1, 1) may be used for calculation in combination with the values of the three features. In the process of continuous use of the method, the accuracy of the search results can be judged by obtaining the user's click data on the search results; furthermore, according to the click data, the weights corresponding to the semantic features, the weights corresponding to the copy features, and the weights corresponding to the style features Updating can effectively improve the accuracy of the final weight setting; thereby improving the accuracy of the final image retrieval.

S105. Acquire an image to be retrieved, calculate a vector to be retrieved corresponding to the image to be retrieved, and calculate a similarity value between any historical image in the retrieval database and the image to be retrieved according to the vector to be retrieved and the retrieval vector.

S106. Return the retrieval result corresponding to the image to be retrieved according to the similarity values corresponding to all the historical images.

That is to say, obtain the image to be retrieved uploaded by the user, extract the semantic feature, copy feature and style feature corresponding to the image to be retrieved, and fuse the three features to obtain the vector to be retrieved corresponding to the image to be retrieved; then, calculate The cosine similarity between the vector to be retrieved and the retrieval image corresponding to any historical image in the retrieval database; the cosine similarity is used as the similarity value between the image to be retrieved and the historical image; thus, the traversal retrieval The database can calculate the similarity value between the image to be retrieved and each historical image; then, sort the historical images according to the size of the similarity value, and return the retrieval result corresponding to the image to be retrieved according to the sorting result.

To sum up, according to the image retrieval method of the embodiment of the present invention, first, obtain historical images, and perform saliency detection on the historical images through a pre-trained saliency detection network, so as to extract the main part in the historical images; then, According to the saliency detection results, the historical images are semantically analyzed to obtain the semantic features of the historical images; then, the historical images are extracted from the text, and the corresponding copy features of the historical images are calculated according to the text extraction results; then, the historical images are input into the style In the recognition model, the style feature of the historical image is extracted through the style recognition model; then, the semantic feature, copy feature and style feature are fused to obtain a retrieval vector; and the historical image and the corresponding retrieval vector are added to the retrieval database , to generate a retrieval database through multiple historical images and their corresponding retrieval vectors; then, obtain the image to be retrieved, and calculate the retrieval vector corresponding to the retrieval image, and calculate according to the retrieval vector corresponding to the retrieval vector and any historical image The similarity value between the image to be retrieved and the historical image; then, return the retrieval result corresponding to the image to be retrieved according to the similarity values corresponding to all historical images; thereby realizing the extraction of feature information of the image from multiple dimensions and deep mining of the original image latent information, thereby improving the accuracy of image retrieval.

In order to realize the above-mentioned embodiments, an embodiment of the present invention proposes a computer-readable storage medium on which an image retrieval program is stored, and when the image retrieval program is executed by a processor, the above-mentioned image retrieval method is implemented.

In order to realize the above-mentioned embodiments, the embodiment of the present invention proposes a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, the following The image retrieval method described above.

In order to realize the above-mentioned embodiment, the embodiment of the present invention proposes an image retrieval device, as shown in FIG. module 50 and feedback module 60 .

Among them, the semantic feature module 10 is used to obtain historical images, and perform saliency detection on historical images through a pre-trained saliency detection network, and perform semantic extraction on historical images according to the saliency detection results to obtain semantic features of historical images ;

The copy feature module 20 is used to extract the text of the historical image, and calculate the corresponding text feature of the historical image according to the text extraction result;

The style feature module 30 is used for inputting the historical image into the style recognition model, to obtain the style feature of the historical image;

The database module 40 is used to calculate the retrieval vectors corresponding to the historical images according to the semantic features, copywriting features and style features, and generate a retrieval database according to multiple historical images and the retrieval vectors corresponding to each historical image;

The retrieval module 50 is used to obtain the image to be retrieved, and calculate the vector to be retrieved corresponding to the image to be retrieved, and calculate the similarity value between any historical image in the retrieval database and the image to be retrieved according to the vector to be retrieved and the retrieval vector;

The feedback module 60 is used to return the retrieval result corresponding to the image to be retrieved according to the similarity values corresponding to all historical images.

It should be noted that the above description about the image retrieval method in FIG. 1 is also applicable to the image retrieval device, and details are not repeated here.

To sum up, according to the image retrieval device of the embodiment of the present invention, by setting the semantic feature module to obtain historical images, and performing saliency detection on historical images through a pre-trained saliency detection network, and according to the saliency detection results Semantic extraction of historical images to obtain the semantic features of historical images; the copy feature module is used to extract text from historical images, and calculates the corresponding copy features of historical images according to the results of text extraction; the style feature module is used to input historical images into The style recognition model is used to obtain the style features of historical images; the database module is used to calculate the retrieval vectors corresponding to historical images based on semantic features, copy features and style features, and generate retrievals based on multiple historical images and the retrieval vectors corresponding to each historical image database; the retrieval module is used to obtain the image to be retrieved, and calculate the vector to be retrieved corresponding to the image to be retrieved, and calculate the similarity value between any historical image in the retrieval database and the image to be retrieved according to the vector to be retrieved and the retrieval vector; the feedback module It is used to return the retrieval results corresponding to the images to be retrieved according to the similarity values corresponding to all historical images; thus, the feature information of images can be extracted from multiple dimensions, and the potential information of original images can be deeply mined, thereby improving the accuracy of image retrieval.

Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

It should be noted that, in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names.

While preferred embodiments of the invention have been described, additional changes and modifications to these embodiments can be made by those skilled in the art once the basic inventive concept is appreciated. Therefore, it is intended that the appended claims be construed to cover the preferred embodiment as well as all changes and modifications which fall within the scope of the invention.

Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention also intends to include these modifications and variations.

In the description of the present invention, it should be understood that the terms "first" and "second" are used for description purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Thus, a feature defined as "first" and "second" may explicitly or implicitly include one or more of these features. In the description of the present invention, "plurality" means two or more, unless otherwise specifically defined.

In the present invention, unless otherwise clearly specified and limited, terms such as "installation", "connection", "connection" and "fixation" should be understood in a broad sense, for example, it can be a fixed connection or a detachable connection , or integrated; it can be mechanically connected or electrically connected; it can be directly connected or indirectly connected through an intermediary, and it can be the internal communication of two components or the interaction relationship between two components. Those of ordinary skill in the art can understand the specific meanings of the above terms in the present invention according to specific situations.

In the present invention, unless otherwise clearly specified and limited, the first feature may be in direct contact with the first feature or the first and second feature may be in direct contact with the second feature through an intermediary. touch. Moreover, "above", "above" and "above" the first feature on the second feature may mean that the first feature is directly above or obliquely above the second feature, or simply means that the first feature is higher in level than the second feature. "Below", "beneath" and "beneath" the first feature may mean that the first feature is directly below or obliquely below the second feature, or simply means that the first feature is less horizontally than the second feature.

In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the above terms should not be understood as necessarily referring to the same embodiment or example. Furthermore, the described specific features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without conflicting with each other.

Although the embodiments of the present invention have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limiting the present invention, those skilled in the art can make the above-mentioned The embodiments are subject to changes, modifications, substitutions and variations.

Claims

An image retrieval method, characterized in that, comprising the following steps:

Acquiring historical images, performing saliency detection on the historical images through a pre-trained saliency detection network, and performing semantic extraction on the historical images according to the saliency detection results, to obtain semantic features of the historical images;

performing copy extraction on the historical image, and calculating the copy features corresponding to the historical image according to the copy extraction result;

inputting the historical image into a style recognition model to obtain the style features of the historical image;

calculating a retrieval vector corresponding to the historical image according to the semantic feature, the copy feature and the style feature, and generating a retrieval database according to a plurality of the historical images and the retrieval vector corresponding to each historical image;

Acquire the image to be retrieved, and calculate the vector to be retrieved corresponding to the image to be retrieved, and calculate the similarity between any historical image in the retrieval database and the image to be retrieved according to the vector to be retrieved and the retrieval vector value;

The retrieval result corresponding to the image to be retrieved is returned according to the similarity values corresponding to all historical images.
The image retrieval method according to claim 1, wherein the training of the saliency detection network comprises:

Obtaining an open source dataset and a subject-free image, extracting subject information of the image in the open source dataset, and fusing the subject information with the subject-free image;

A training set is generated according to the fusion result of the open source data set and the subject information and the subject-free image, so as to train the saliency detection network according to the training set.
The image retrieval method according to claim 1, wherein calculating the copy features corresponding to the historical images according to the copy extraction results includes:

performing word segmentation and keyword extraction on the copy extraction result to generate keywords corresponding to the copy extraction result and weights corresponding to the keywords;

The keywords are mapped to keyword vectors, and a weighted average is performed according to the keyword vectors and corresponding weights to obtain copy features corresponding to the historical images.
The image retrieval method according to claim 1, wherein calculating the retrieval vector corresponding to the historical image according to the semantic feature, the copy feature and the style feature comprises:

Obtaining the weights corresponding to the semantic features, the weights corresponding to the copywriting features, and the weights corresponding to the style features, and according to the weights corresponding to the semantic features, the weights corresponding to the copywriting features, and the weights corresponding to the style features Perform feature fusion on the semantic feature, the copy feature and the style feature to obtain the retrieval vector.
The image retrieval method according to claim 4, further comprising:

Acquiring the user's click data on the retrieval result, and updating the weight corresponding to the semantic feature, the weight corresponding to the copywriting feature, and the weight corresponding to the style feature according to the click data.
A computer-readable storage medium, characterized in that an image retrieval program is stored thereon, and when the image retrieval program is executed by a processor, the image retrieval method according to any one of claims 1-5 is implemented.
A computer device, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, characterized in that, when the processor executes the program, any one of claims 1-5 can be realized. The image retrieval method described in item.
An image retrieval device, characterized in that it comprises:

a semantic feature module, wherein the semantic feature module is used to acquire historical images, perform saliency detection on the historical images through a pre-trained saliency detection network, and perform semantic extraction on the historical images according to the saliency detection results, to obtain the semantic features of the historical image;

A copy feature module, the copy feature module is used to extract the copy of the historical image, and calculate the copy feature corresponding to the historical image according to the copy extraction result;

a style feature module, the style feature module is used to input the historical image into a style recognition model to obtain the style feature of the historical image;

A database module, the database module is used to calculate the retrieval vectors corresponding to the historical images according to the semantic features, the copy features and the style features, and to retrieve vectors corresponding to each historical image based on multiple historical images and each historical image vector generation retrieval database;

A retrieval module, the retrieval module is used to obtain the image to be retrieved, and calculate the vector to be retrieved corresponding to the image to be retrieved, and calculate the relationship between any historical image in the retrieval database according to the vector to be retrieved and the retrieval vector the similarity value between the images to be retrieved;

A feedback module, the feedback module is used to return the retrieval result corresponding to the image to be retrieved according to the similarity values corresponding to all the historical images.
The image retrieval device according to claim 8, wherein the training of the saliency detection network comprises:

Obtaining an open source dataset and a subject-free image, extracting subject information of the image in the open source dataset, and fusing the subject information with the subject-free image;

A training set is generated according to the fusion result of the open source data set and the subject information and the subject-free image, so as to train the saliency detection network according to the training set.
The image retrieval device according to claim 8, wherein the calculation of the copy features corresponding to the historical images according to the copy extraction results includes:

performing word segmentation and keyword extraction on the copy extraction result to generate keywords corresponding to the copy extraction result and weights corresponding to the keywords;

The keywords are mapped to keyword vectors, and weighted average is carried out according to the keyword vectors and corresponding weights to obtain the copy features corresponding to the historical images.