CN115687670A

CN115687670A - Image searching method and device, computer readable storage medium and electronic equipment

Info

Publication number: CN115687670A
Application number: CN202310000631.2A
Authority: CN
Inventors: 王锘然; 金沛然; 徐健; 蔡莹莹; 田宁; 刘冠辰; 韩国民
Original assignee: Henan Wenshubao Intelligent Technology Research Institute Co ltd; Tianjin Hengda Wenbo Science& Technology Co ltd
Current assignee: Henan Wenshubao Intelligent Technology Research Institute Co ltd; Tianjin Hengda Wenbo Science& Technology Co ltd
Priority date: 2023-01-03
Filing date: 2023-01-03
Publication date: 2023-02-03

Abstract

The invention discloses an image searching method, an image searching device, a computer readable storage medium and electronic equipment. The method comprises the following steps: acquiring an image to be searched; performing feature extraction on an image to be searched based on a target feature extraction model to obtain first target feature information; calculating the similarity between the first target characteristic information and second target characteristic information corresponding to each image to be matched to obtain a plurality of similarity values; and determining a target image from the plurality of images to be matched based on the plurality of similarity values, and determining the target image as a search result corresponding to the image to be searched. The invention solves the technical problem of poor searching accuracy when searching the images based on the hand-drawn images in the related art.

Description

Image searching method and device, computer readable storage medium and electronic equipment

Technical Field

The invention relates to the field of artificial intelligence, in particular to an image searching method, an image searching device, a computer-readable storage medium and electronic equipment.

Background

With the continuous development of science and technology, the image searching mode is not limited to the text description, but the image searching mode can be used for searching images. When the user gives a query graph, the system can automatically search the database for pictures similar to the content of the query graph and feed back the pictures to the user.

The traditional task of searching the graph by the graph requires that the query graph and the target graph have obvious similar characteristics, but the characteristics of the query graph (such as a hand drawing and the like) cannot be completely consistent with the characteristics of the target graph due to the randomness of the query graph. At present, a model applied to a map searching technology is generally a traditional algorithm or supervised learning, so that the image adaptability to a proprietary field is poor, and the problem of poor searching accuracy is solved.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides an image searching method, an image searching device, a computer readable storage medium and electronic equipment, which are used for at least solving the technical problem of poor searching accuracy when searching images based on hand drawing in the related technology.

According to an aspect of an embodiment of the present invention, there is provided an image search method including: acquiring an image to be searched; performing feature extraction on an image to be searched based on a target feature extraction model to obtain first target feature information, wherein the target feature extraction model is obtained by training based on an unsupervised contrast learning mode, the target feature extraction model is trained by adopting a target training sample set, and the target training sample set is obtained by performing multiple image enhancement on each image in a plurality of images; calculating the similarity between the first target characteristic information and second target characteristic information corresponding to each image to be matched to obtain a plurality of similarity values, wherein the second target characteristic information is obtained by performing characteristic extraction on the image to be matched based on a target characteristic extraction model; and determining a target image from the images to be matched based on the similarity values, and determining the target image as a search result corresponding to the images to be searched, wherein the similarity between the image content of the target image and the image content of the images to be searched is greater than the preset similarity.

Further, the image search method further includes: performing feature extraction processing on an image to be searched to obtain initial feature information; performing local feature extraction processing on the initial feature information to obtain local feature information; performing global feature extraction processing on the initial feature information to obtain global feature information; and performing feature fusion on the local feature information and the global feature information to obtain first target feature information.

Further, the target training sample set is constructed based on the following method: acquiring a plurality of first sample images; performing first image enhancement processing on each first sample image to obtain a second sample image corresponding to each first sample image; performing second image enhancement processing on each first sample image to obtain a third sample image corresponding to each first sample image, wherein the first image enhancement processing is different from the second image enhancement processing; and constructing a target training sample set based on the second sample image and the third sample image.

Further, the image search method further includes: acquiring a plurality of initial sample images; performing multi-scale cutting processing on each initial sample image to obtain a plurality of fourth sample images corresponding to each initial sample image, wherein the fourth sample images have different sizes; and carrying out normalization processing on the plurality of fourth sample images to obtain a first sample image corresponding to each fourth sample image.

Further, the target feature extraction model is obtained by training based on the following method: constructing an initial feature extraction model; based on the initial feature extraction model, performing feature extraction on each target sample image in the target training sample set to obtain target sample feature information corresponding to each target sample image, wherein the target sample image is a second sample image or a third sample image; and training an initial feature extraction model based on the target sample feature information corresponding to each target sample image to obtain a target feature extraction model.

Further, the image search method further includes: performing feature extraction processing on each target sample image to obtain sample initial feature information corresponding to each target sample image; performing local feature extraction processing on the initial feature information of each sample to obtain sample local feature information corresponding to each target sample image; carrying out global feature extraction processing on the initial feature information of each sample to obtain sample global feature information corresponding to each target sample image; and carrying out feature fusion on the sample local feature information and the sample global feature information corresponding to each target sample image to obtain the target sample feature information corresponding to each target sample image.

Further, the image search method further includes: and performing normalization processing on the image to be searched and the image to be matched before performing feature extraction on the image to be searched based on the target feature extraction model to obtain first target feature information.

According to another aspect of the embodiments of the present invention, there is also provided an image search apparatus including: the first acquisition module is used for acquiring an image to be searched; the first feature extraction module is used for performing feature extraction on an image to be searched based on a target feature extraction model to obtain first target feature information, wherein the target feature extraction model is obtained by training based on an unsupervised contrast learning mode, the target feature extraction model is trained by adopting a target training sample set, and the target training sample set is obtained by performing multiple image enhancement on each image in a plurality of images; the calculating module is used for calculating the similarity between the first target characteristic information and second target characteristic information corresponding to each image to be matched to obtain a plurality of similarity values, wherein the second target characteristic information is obtained by performing characteristic extraction on the image to be matched based on a target characteristic extraction model; and the determining module is used for determining a target image from the plurality of images to be matched based on the plurality of similarity values, and determining the target image as a search result corresponding to the image to be searched, wherein the similarity between the image content of the target image and the image content of the image to be searched is greater than the preset similarity.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to execute the above-mentioned image search method when running.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including one or more processors; a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running the program, wherein the program is arranged to perform the image search method described above when run.

In the embodiment of the invention, a mode of searching pictures by using a model obtained by training based on an unsupervised contrast learning mode is adopted, images to be searched are obtained, then feature extraction is carried out on the images to be searched based on a target feature extraction model to obtain first target feature information, then the similarity between the first target feature information and second target feature information corresponding to each image to be matched is calculated to obtain a plurality of similarity values, so that target images are determined from the images to be matched based on the similarity values, and the target images are determined to be search results corresponding to the images to be searched.

In the above process, the target training sample set is obtained based on performing multiple kinds of image enhancement on each image in the multiple images, so that the construction of a positive sample pair in a contrast learning manner is realized, and the image information of the images is prevented from being labeled. Further, the target feature extraction model is trained based on an unsupervised comparison learning mode, on one hand, a model with robustness on different training data can be obtained by using the unsupervised learning mode, and on the other hand, the network can have better distinguishing capability on different variants of similar images by using the comparison learning mode, so that the features which can be learned by the target feature extraction model have better adaptability and richness. Furthermore, the image to be searched and the image to be matched are subjected to feature extraction through the target feature extraction model, so that more accurate feature representation of the image to be searched and the image to be matched is realized, the similarity between the image to be searched and the image to be matched, which is calculated based on the result of the feature extraction, is more accurate, and the accuracy of image search is further improved.

Therefore, the scheme provided by the application achieves the purpose of realizing image search based on the model obtained by the unsupervised contrast learning mode training, thereby realizing the technical effect of improving the image search accuracy, and further solving the technical problem of poor search accuracy when the image search is carried out based on the hand-drawn image in the related technology.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention and do not constitute a limitation of the invention. In the drawings:

FIG. 1 is a schematic diagram of an alternative image search method according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating the relationship between an alternative image search system, a terminal device, and an image feature database according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an alternative initial feature extraction model training scheme in accordance with an embodiment of the present invention;

FIG. 4 is a flow chart of an alternative image search method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an alternative image search apparatus according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of an alternative electronic device according to an embodiment of the invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are information and data authorized by the user or sufficiently authorized by each party.

Example 1

In accordance with an embodiment of the present invention, there is provided an embodiment of an image search method, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

The image searching method provided by the application is applied to searching scenes by pictures, wherein the characteristics of the query picture and the target picture are difficult to be completely consistent, for example, the scenes of the target picture are queried through a hand-drawing picture. Further, the image searching method provided by the application can be applied to matching scenes of cultural relic images through hand-drawing pictures in the field of cultural and museum, matching scenes of commodity images through hand-drawing pictures on shopping software and the like.

Fig. 1 is a schematic diagram of an alternative image searching method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

step S101, acquiring an image to be searched.

In step S101, an image to be searched may be acquired by an electronic device, a server, a workstation, or the like, and in the present application, the image to be searched is acquired by an image search system. Optionally, the image search system may provide an image search service in the form of an HTTP service, which may provide a Restful standard protocol interface to the outside, and configure the image search service on an associated server, so as to enable the client to complete image search by accessing the server. Optionally, as shown in fig. 2, the user may select an image to be searched through a terminal device held by the user, and upload the image to application software or an applet corresponding to the image search system, so that the image search system can obtain the image. The image to be searched is a picture used by the user to query an image that the user desires to acquire, and the device types of the terminal device include, but are not limited to, a smart phone, a computer, a tablet, a smart wearable device, and the like.

Optionally, in this embodiment, a scene in which a cultural relic image is matched through a hand-drawing in the cultural and gambling field is taken as an example, and the image search method provided by the present application is explained. For example, the user draws a hand drawing (i.e., the aforementioned image to be searched) on the terminal device held by the user, and then uploads the hand drawing to the Wen Bo application software for the image search system to obtain.

And S102, extracting the features of the image to be searched based on a target feature extraction model to obtain first target feature information, wherein the target feature extraction model is obtained by training based on an unsupervised contrast learning mode, the target feature extraction model is trained by adopting a target training sample set, and the target training sample set is obtained by performing multiple image enhancement on each image in the plurality of images.

In step S102, a target training sample set adopted by the target feature extraction model is obtained by performing multiple image enhancements on each image in multiple images, that is, the images included in the target training sample set may be divided into multiple groups, where each group of images includes multiple images, and the multiple images in the same group of images are obtained by performing different image enhancements on the same image. In the scene of searching images by images, in which hand-drawing images are matched with cultural relic images, all the images are drawn by black and white lines, so the images before image enhancement are preferably drawn by black and white lines.

Further, feature extraction may be performed on the image to be searched based on the target feature extraction model to obtain first target feature information corresponding to the image to be searched, where the extraction manner of feature extraction performed on the image to be searched includes, but is not limited to, local feature extraction, global feature extraction, and the like.

It should be noted that, because the target training sample set is obtained based on performing multiple kinds of image enhancement on each image in the multiple images, the construction of the positive sample pair in the contrast learning manner is realized, thereby avoiding labeling the image information of the images. Furthermore, the target feature extraction model is trained based on an unsupervised comparison learning mode, on one hand, a model with robustness on different training data can be obtained by using the unsupervised learning mode, and on the other hand, the model can have better distinguishing capability on different variants of similar images by using the comparison learning mode, so that the features which can be learned by the target feature extraction model have better adaptability and richness, and the searching accuracy is further improved.

Step S103, calculating the similarity between the first target characteristic information and second target characteristic information corresponding to each image to be matched to obtain a plurality of similarity values, wherein the second target characteristic information is obtained by performing characteristic extraction on the image to be matched based on a target characteristic extraction model.

In step S103, the image search system may calculate a similarity between the first target feature information and the second target feature information corresponding to each of the multiple images to be matched according to a similarity algorithm in the related art, so as to obtain a similarity value between the first target feature information and each of the second target feature information. The image search system may obtain, from the image feature database shown in fig. 2, second target feature information corresponding to each image to be matched in the multiple images to be matched, where the image to be matched is an image in an image set where an image that a user desires to obtain is located, for example, the image to be matched is an image corresponding to an existing cultural relic in a museum.

For example, after the image search system obtains a hand-drawn image input by a user, and performs feature extraction on the hand-drawn image to obtain feature information (i.e., the first target feature information) corresponding to the hand-drawn image, the image search system may obtain feature information (i.e., the second target feature information) corresponding to each cultural relic picture (i.e., the to-be-matched image) from an image feature database as shown in fig. 2, so as to calculate the similarity between the feature information corresponding to the hand-drawn image and the feature information corresponding to each cultural relic picture, and obtain a plurality of similarity values.

And step S104, determining a target image from the plurality of images to be matched based on the plurality of similarity values, and determining the target image as a search result corresponding to the images to be searched, wherein the similarity between the image content of the target image and the image content of the images to be searched is greater than the preset similarity.

In step S104, optionally, the image search system may determine an image to be matched, of which the similarity value is higher than a preset value among the similarity values, as a target image, then find the target image from the image feature database, and output the target image to a terminal device held by the user, so as to present the target image to the user as a search result corresponding to the image to be searched. Alternatively, the image search system may determine an image to be matched, which has the highest similarity value among the plurality of similarity values, as the target image.

Based on the schemes defined in the above steps S101 to S104, it can be known that, in the embodiment of the present invention, a mode of performing image search by using a model trained based on an unsupervised contrast learning mode is adopted, an image to be searched is obtained, feature extraction is performed on the image to be searched based on a target feature extraction model to obtain first target feature information, then a similarity between the first target feature information and second target feature information corresponding to each image to be matched is calculated to obtain a plurality of similarity values, so that a target image is determined from the plurality of images to be matched based on the plurality of similarity values, and the target image is determined as a search result corresponding to the image to be searched, where the target feature extraction model is obtained based on the unsupervised contrast learning mode, the target feature extraction model is trained by using a target training sample set, the target training sample set is obtained based on performing multiple image enhancements on each image in the plurality of images, the second target feature information is obtained by performing feature extraction on the matched image based on the target feature extraction model, and the similarity between the image content of the target image and the image content of the image to be searched is greater than the preset similarity.

It is easy to note that, in the above process, since the target training sample set is obtained based on performing multiple kinds of image enhancement on each image in the multiple images, the construction of the positive sample pair in the contrast learning manner is realized, so that the labeling of the image information of the images is avoided. Further, the target feature extraction model is trained based on an unsupervised comparison learning mode, on one hand, a model with robustness on different training data can be obtained by using the unsupervised learning mode, and on the other hand, the network can have better distinguishing capability on different variants of similar images by using the comparison learning mode, so that the features which can be learned by the target feature extraction model have better adaptability and richness. Furthermore, the image to be searched and the image to be matched are subjected to feature extraction through the target feature extraction model, so that more accurate feature representation of the image to be searched and the image to be matched is realized, the similarity between the image to be searched and the image to be matched, which is calculated based on the result of the feature extraction, is more accurate, and the accuracy of image search is further improved.

Therefore, the scheme provided by the application achieves the purpose of realizing image search based on the model obtained by the unsupervised contrast learning mode training, thereby realizing the technical effect of improving the image search accuracy, and further solving the technical problem of poor search accuracy when searching images based on hand-drawn images in the related technology.

In an alternative embodiment, the target training sample set is constructed based on the following method: the image searching system obtains a plurality of first sample images, then performs first image enhancement processing on each first sample image to obtain a second sample image corresponding to each first sample image, and then performs second image enhancement processing on each first sample image to obtain a third sample image corresponding to each first sample image, so that a target training sample set is constructed based on the second sample image and the third sample image. Wherein the first image enhancement processing is different from the second image enhancement processing.

Optionally, the image search system may obtain a plurality of first sample images, where the first sample images may be directly crawled from a database or the internet, and the first sample images may also be obtained by preprocessing the crawled images. Further, the image search system may apply two different data enhancement processes to each first sample image, resulting in corresponding second and third sample images. The data enhancement processing may be at least one geometric transformation method applied to the image, or at least one pixel transformation method applied to the image, or at least one geometric transformation method and at least one pixel transformation method applied to the image, where the geometric transformation method may be inversion, rotation, scaling, translation, dithering, and the like, and the pixel transformation method may be salt and pepper noise, gaussian blur, histogram equalization, and the like.

Further, the image search system may construct a target training sample set based on the second sample image and the third sample image corresponding to each first sample image. For example, when there are N first sample images, then 2N images are included in the target training sample set.

It should be noted that, in the case of unsupervised learning, what kind of information can be learned by the network due to what kind of data changes, so that, on one hand, by performing various data enhancements on the image, the construction of a positive sample pair in the comparative learning is realized, and on the other hand, the target feature extraction model can be better learned.

In an optional embodiment, in the process of acquiring a plurality of first sample images, the image gathering system may acquire a plurality of initial sample images, and then perform multi-scale cropping processing on each initial sample image to obtain a plurality of fourth sample images corresponding to each initial sample image, so as to perform normalization processing on the plurality of fourth sample images to obtain the first sample image corresponding to each fourth sample image. Wherein the plurality of fourth sample images have different sizes.

Optionally, the image search system may first crawl a plurality of images from a database or the internet, then clean the plurality of collected images, and pick out complete and clear images beneficial for training as initial sample images. In order to enable the trained target feature extraction model to more accurately extract features of the image to be matched and the image to be searched, when the initial sample image is selected, an image having associated features with the image to be matched and the image to be searched can be preferentially selected, for example, if the image to be searched has associated features, a line drawing is used, and if the image to be matched has associated features, a cultural relic image is used, the initial sample image can be a line drawing of a relevant cultural relic, so that the model mainly learns and distinguishes shapes of different line drawings.

Further, as shown in fig. 3, the image search system may perform preprocessing on the obtained initial sample image, and in the preprocessing, for one initial sample image, the image search system may set different sizes to perform multi-scale random cropping on the initial sample image, so as to obtain a plurality of fourth sample images corresponding to the initial sample image, where the plurality of fourth sample images have different sizes. Then, the image search system may perform normalization processing on the plurality of fourth sample images to adjust the plurality of fourth sample images to a uniform size, so as to obtain the first sample images with the same size, thereby completing preprocessing of one initial sample image. Optionally, the image search system may perform the same preprocessing on each initial sample image based on the foregoing method, so as to obtain a plurality of first sample images corresponding to each initial sample image.

It should be noted that, by performing multi-scale cropping on the initial sample image, the target feature extraction model can learn the size information of the image to adapt to the multi-scale image in the application stage, thereby further improving the accuracy of image search.

In an alternative embodiment, the target feature extraction model may be trained based on the following method: an initial feature extraction model is built, feature extraction is carried out on each target sample image in a target training sample set based on the initial feature extraction model, target sample feature information corresponding to each target sample image is obtained, the initial feature extraction model is trained based on the target sample feature information corresponding to each target sample image, and the target feature extraction model is obtained. And the target sample image is the second sample image or the third sample image.

Optionally, as shown in fig. 3, the image search system may first construct an initial feature extraction model, and then input all images in the target training sample set (i.e., the aforementioned target sample images) into the initial feature extraction model, so as to perform feature extraction on each image through the initial prediction module, thereby obtaining target sample feature information corresponding to each image.

Further, as shown in fig. 3, the image search system may compare and learn two images, which are different data-enhanced from the same first sample image, as positive examples (that is, the two images may form a positive sample pair), and other images in the target training sample set as negative examples (that is, any one of the two images and the other images may form a negative sample pair), so as to train the initial feature extraction model based on the target sample feature information corresponding to each target sample image through a contrast loss function, so as to enable the network to learn the deep feature representation of the image without supervision and learning. Before the images in the target training sample set are input to the initial feature extraction model, the same marks are set on the second sample image and the third sample image corresponding to the same first sample image, and different marks are set on the second sample image and the third sample image corresponding to different first sample images, so that the positive sample pair and the negative sample pair are determined in the contrast learning process.

Optionally, the image search system may also randomly select at least a partial image from the target training sample set after the initial feature extraction model is constructed and input to the initial feature extraction model, where the at least a partial image is composed of second sample images and third sample images corresponding to part of the first sample images in all the first sample images, so as to ensure that a positive example or a negative example can be found for each second sample image or third sample image. Further, the initial prediction module may perform feature extraction on each of the acquired at least partial images, so as to obtain target sample feature information corresponding to each image.

Further, the image search system may train the initial feature extraction model based on the aforementioned target sample feature information corresponding to at least part of the image through a contrast loss function. And in the iterative training process, only at least part of images in the target training sample set are randomly selected and input to the initial feature extraction model each time.

It should be noted that the target feature extraction model is obtained through training based on an unsupervised comparison learning method, so that the target feature extraction model can have the resolution capability for different expression forms of the same line drawing, and the accuracy of feature extraction is further improved.

In an optional embodiment, in the process of performing feature extraction on each target sample image in a target training sample set based on an initial feature extraction model to obtain target sample feature information corresponding to each target sample image, the initial feature extraction model may perform feature extraction processing on each target sample image to obtain sample initial feature information corresponding to each target sample image, then perform local feature extraction processing on each sample initial feature information to obtain sample local feature information corresponding to each target sample image, thereby performing global feature extraction processing on each sample initial feature information to obtain sample global feature information corresponding to each target sample image, and perform feature fusion on the sample local feature information corresponding to each target sample image and the sample global feature information to obtain target sample feature information corresponding to each target sample image.

Optionally, the initial feature extraction model is a network structure that combines a Transformer based on a CNN (Convolutional Neural Networks), where the Transformer structure includes a self-attention network and a feedforward Neural network. Specifically, as shown in fig. 3, the initial feature extraction model includes a basic module, a local feature extraction module, a global feature extraction module, and a feature fusion module.

As shown in fig. 3, after the initial feature extraction model acquires the target sample image, the initial feature extraction model may process the target sample image by using a basic module to obtain basic features (that is, the aforementioned sample initial feature information), where the basic features obtained here are used to perform preliminary feature representation on the image, and the accuracy of the image feature representation by the basic features is lower than the accuracy of the image feature representation by the subsequent sample local feature information or sample global feature information. As shown in fig. 3, the basic module is a backbone network (backbone) in the initial feature extraction model, which is a general CNN network structure.

Further, as shown in fig. 3, the output of the basic module may be used as the input of the local feature extraction module and the global feature extraction module, so that the local feature extraction module performs local feature extraction on each sample initial feature information to obtain sample local feature information corresponding to each target sample image, and the global feature extraction module performs global feature extraction on each sample initial feature information to obtain sample global feature information corresponding to each target sample image. The CNN adopted by the local feature extraction module at least consists of a plurality of convolution layers, and the deep convolution of the CNN can enable a network to pay more attention to local information in an image; the global feature extraction module adopts a Self-attention (Self-attention) layer to make a Self-attention mechanism on the input feature information, and a network can pay attention to the global information in the image through the global attention mechanism.

Furthermore, as shown in fig. 3, after the output results of the local feature extraction module and the global feature extraction module are obtained, the local feature information of the sample and the global feature information of the sample are spliced by the feature fusion module to obtain spliced feature information, and a Self-attention mechanism is applied to the spliced and spliced feature information again through a Self-attention (Self-attention) layer to achieve better fusion of the local feature information of the sample and the global feature information of the sample, so as to obtain feature information of the target sample, thereby enabling the model to output features with higher robustness.

It should be noted that, feature extraction is performed on an image based on an initial feature extraction model combining a transform and a CNN, so that the image can be more effectively represented by features, and thus, after the initial feature extraction model is trained based on an unsupervised learning manner, a model (i.e., the target feature extraction model) capable of distinguishing two line drawings with similar shapes but different expression forms can be obtained, and by using the model, consistent and high-uniformity image feature vector representation of the line drawings or the hand drawings can be obtained, so that the matching accuracy of a task of searching the line drawings can be ensured.

In an optional embodiment, before performing feature extraction on the image to be searched based on the target feature extraction model to obtain the first target feature information, the image search system may perform normalization processing on the image to be searched and the image to be matched.

Optionally, as shown in fig. 4, in the application process, the image search system may perform normalization processing (i.e., preprocessing in fig. 4) on the image to be matched first, and then extract image features of the image to be matched through the target feature extraction model, so as to obtain second target feature information corresponding to each image to be matched. The target feature extraction model is obtained based on the training of the initial feature extraction model, so that the target feature extraction model and the initial feature extraction model have the same network structure, and the image feature extraction method of the target feature extraction model on the image to be matched is the same as the image feature extraction method of the initial feature extraction model on the sample image, so the description is omitted here. The processed second target feature information may be stored in the image feature database.

Further, when the image search system acquires an image to be searched input by a user, as shown in fig. 4, the image search system also performs normalization processing (i.e., preprocessing in fig. 4) on the image to be searched, and then extracts image features of the image to be searched through the target feature extraction model to obtain first target feature information. And the size of the image to be searched is the same as that of the image to be matched after normalization.

It should be noted that, before feature extraction is performed on the image to be searched and the image to be matched, normalization is performed on the image to be searched and the image to be matched, so that better processing of the target feature extraction model is facilitated.

In an optional embodiment, in the process of extracting the features of the image to be searched to obtain the first target feature information, the image search system may perform feature extraction processing on the image to be searched to obtain initial feature information, then perform local feature extraction processing on the initial feature information to obtain local feature information, and then perform global feature extraction processing on the initial feature information to obtain global feature information, so as to perform feature fusion on the local feature information and the global feature information to obtain the first target feature information.

Alternatively, as shown in fig. 4, after the image to be searched is acquired, the image searching system inputs the acquired image to the target feature extraction model previously used for processing the image to be matched. Thereby obtaining first target characteristic information. The image feature extraction method of the target feature extraction model on the image to be searched is the same as the image feature extraction method of the initial feature extraction model on the sample image, and therefore the description is omitted here.

Further, after the first target feature information is obtained, as shown in fig. 4, the image search system may perform similarity calculation (i.e., similarity measurement in fig. 4) on the first target feature information and each piece of second target feature information, so as to obtain a similarity value between the first target feature information and the second target feature information corresponding to each image to be matched, thereby facilitating determination of the target image.

The application provides a graph searching technology which is suitable for matching cultural relic pictures with hand-drawn graphs in the field of cultural and museum based on unsupervised learning. Compared with the image searching algorithm in the related technology, the method provided by the application has better adaptability to the requirements of different application scenes, and the unsupervised learning mode is adopted, so that the images do not need to be marked in the data preparation stage, and the training can be carried out only by selecting the images related to specific requirements. Meanwhile, the method has the advantages that the training strategy and the effectiveness of the network structure in the method provided by the application are benefited, the accuracy of searching the images by the images and the matching efficiency are guaranteed, and the method not only can realize more reliable matching, but also is suitable for various scenes. The method provided by the application can be used as a selectable item for improving the experience of user exhibition in the content construction of museums in the field of museums.

Example 2

According to an embodiment of the present invention, an embodiment of an image search apparatus is provided, where fig. 5 is a schematic diagram of an alternative image search apparatus according to an embodiment of the present invention, as shown in fig. 5, the apparatus includes:

a first obtaining module 501, configured to obtain an image to be searched;

the first feature extraction module 502 is configured to perform feature extraction on an image to be searched based on a target feature extraction model to obtain first target feature information, where the target feature extraction model is obtained by training based on an unsupervised contrast learning manner, the target feature extraction model is trained by using a target training sample set, and the target training sample set is obtained by performing multiple image enhancements on each image of multiple images;

a calculating module 503, configured to calculate a similarity between the first target feature information and second target feature information corresponding to each image to be matched, so as to obtain multiple similarity values, where the second target feature information is obtained by performing feature extraction on the image to be matched based on a target feature extraction model;

a determining module 504, configured to determine a target image from the multiple images to be matched based on the multiple similarity values, and determine the target image as a search result corresponding to the image to be searched, where a similarity between an image content of the target image and an image content of the image to be searched is greater than a preset similarity.

It should be noted that the first obtaining module 501, the first feature extracting module 502, the calculating module 503 and the determining module 504 correspond to steps S101 to S104 in the foregoing embodiment, and the four modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1.

Optionally, the first feature extraction module further includes: the first feature extraction submodule is used for performing feature extraction processing on an image to be searched to obtain initial feature information; the second characteristic extraction submodule is used for carrying out local characteristic extraction processing on the initial characteristic information to obtain local characteristic information; the third feature extraction submodule is used for carrying out global feature extraction processing on the initial feature information to obtain global feature information; and the first feature fusion submodule is used for performing feature fusion on the local feature information and the global feature information to obtain first target feature information.

Optionally, the image searching apparatus further includes: the second acquisition module is used for acquiring a plurality of first sample images; the first image enhancement module is used for performing first image enhancement processing on each first sample image to obtain a second sample image corresponding to each first sample image; the second image enhancement module is used for performing second image enhancement processing on each first sample image to obtain a third sample image corresponding to each first sample image, wherein the first image enhancement processing is different from the second image enhancement processing; and the first construction module is used for constructing a target training sample set based on the second sample image and the third sample image.

Optionally, the second obtaining module further includes: the acquisition submodule is used for acquiring a plurality of initial sample images; the cutting submodule is used for carrying out multi-scale cutting processing on each initial sample image to obtain a plurality of fourth sample images corresponding to each initial sample image, wherein the fourth sample images have different sizes; and the processing submodule is used for carrying out normalization processing on the plurality of fourth sample images to obtain a first sample image corresponding to each fourth sample image.

Optionally, the image searching apparatus further includes: the second construction module is used for constructing an initial feature extraction model; the second feature extraction module is used for extracting features of each target sample image in the target training sample set based on the initial feature extraction model to obtain target sample feature information corresponding to each target sample image, wherein the target sample image is a second sample image or a third sample image; and the training module is used for training the initial characteristic extraction model based on the target sample characteristic information corresponding to each target sample image to obtain a target characteristic extraction model.

Optionally, the second feature extraction module further includes: the fourth feature extraction submodule is used for performing feature extraction processing on each target sample image to obtain sample initial feature information corresponding to each target sample image; the fifth feature extraction submodule is used for carrying out local feature extraction processing on the initial feature information of each sample to obtain sample local feature information corresponding to each target sample image; the sixth feature extraction submodule is used for carrying out global feature extraction processing on the initial feature information of each sample to obtain sample global feature information corresponding to each target sample image; and the second feature fusion submodule is used for performing feature fusion on the sample local feature information and the sample global feature information corresponding to each target sample image to obtain the target sample feature information corresponding to each target sample image.

Optionally, the image searching apparatus further includes: and the processing module is used for carrying out normalization processing on the image to be searched and the image to be matched.

Example 3

Example 4

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, wherein fig. 6 is a schematic diagram of an alternative electronic device according to the embodiments of the present invention, as shown in fig. 6, the electronic device includes one or more processors; a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running the program, wherein the program is arranged to perform the image search method described above when run.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit may be a division of a logic function, and an actual implementation may have another division, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims

1. An image search method, comprising:

acquiring an image to be searched;

performing feature extraction on the image to be searched based on a target feature extraction model to obtain first target feature information, wherein the target feature extraction model is obtained by training based on an unsupervised contrast learning mode, the target feature extraction model is trained by adopting a target training sample set, and the target training sample set is obtained by performing multiple image enhancement on each image in a plurality of images;

calculating the similarity between the first target feature information and second target feature information corresponding to each image to be matched to obtain a plurality of similarity values, wherein the second target feature information is obtained by performing feature extraction on the image to be matched based on the target feature extraction model;

and determining a target image from a plurality of images to be matched based on the similarity values, and determining the target image as a search result corresponding to the images to be searched, wherein the similarity between the image content of the target image and the image content of the images to be searched is greater than a preset similarity.

2. The method according to claim 1, wherein the extracting features of the image to be searched to obtain first target feature information comprises:

performing feature extraction processing on the image to be searched to obtain initial feature information;

performing local feature extraction processing on the initial feature information to obtain local feature information;

carrying out global feature extraction processing on the initial feature information to obtain global feature information;

and performing feature fusion on the local feature information and the global feature information to obtain the first target feature information.

3. The method of claim 1, wherein the target training sample set is constructed based on the following method:

acquiring a plurality of first sample images;

performing first image enhancement processing on each first sample image to obtain a second sample image corresponding to each first sample image;

performing second image enhancement processing on each first sample image to obtain a third sample image corresponding to each first sample image, wherein the first image enhancement processing is different from the second image enhancement processing;

constructing the target training sample set based on the second sample image and the third sample image.

4. The method of claim 3, wherein acquiring a plurality of first sample images comprises:

acquiring a plurality of initial sample images;

performing multi-scale cutting processing on each initial sample image to obtain a plurality of fourth sample images corresponding to each initial sample image, wherein the fourth sample images have different sizes;

and carrying out normalization processing on the plurality of fourth sample images to obtain a first sample image corresponding to each fourth sample image.

5. The method of claim 3 or 4, wherein the target feature extraction model is trained based on the following method:

constructing an initial feature extraction model;

performing feature extraction on each target sample image in the target training sample set based on the initial feature extraction model to obtain target sample feature information corresponding to each target sample image, wherein the target sample image is the second sample image or the third sample image;

and training the initial feature extraction model based on the target sample feature information corresponding to each target sample image to obtain the target feature extraction model.

6. The method according to claim 5, wherein performing feature extraction on each target sample image in the target training sample set to obtain target sample feature information corresponding to each target sample image comprises:

performing feature extraction processing on each target sample image to obtain sample initial feature information corresponding to each target sample image;

performing local feature extraction processing on each sample initial feature information to obtain sample local feature information corresponding to each target sample image;

performing global feature extraction processing on the initial feature information of each sample to obtain sample global feature information corresponding to each target sample image;

and performing feature fusion on the sample local feature information and the sample global feature information corresponding to each target sample image to obtain the target sample feature information corresponding to each target sample image.

7. The method according to claim 1, before performing feature extraction on the image to be searched based on a target feature extraction model to obtain first target feature information, comprising:

and carrying out normalization processing on the image to be searched and the image to be matched.

8. An image search apparatus characterized by comprising:

the first acquisition module is used for acquiring an image to be searched;

the first feature extraction module is used for performing feature extraction on the image to be searched based on a target feature extraction model to obtain first target feature information, wherein the target feature extraction model is obtained by training based on an unsupervised contrast learning mode, the target feature extraction model is trained by adopting a target training sample set, and the target training sample set is obtained by performing multiple image enhancement on each image in a plurality of images;

the calculating module is used for calculating the similarity between the first target characteristic information and second target characteristic information corresponding to each image to be matched to obtain a plurality of similarity values, wherein the second target characteristic information is obtained by performing characteristic extraction on the image to be matched based on the target characteristic extraction model;

and the determining module is used for determining a target image from a plurality of images to be matched based on the similarity values and determining the target image as a search result corresponding to the images to be searched, wherein the similarity between the image content of the target image and the image content of the images to be searched is greater than the preset similarity.

9. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is arranged to execute the image search method according to any one of claims 1 to 7 when executed.

10. An electronic device, characterized in that the electronic device comprises one or more processors; memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement for running a program, wherein the program is arranged to, when run, perform the image search method of any of claims 1 to 7.