CN110399515B

CN110399515B - Picture retrieval method, device and system

Info

Publication number: CN110399515B
Application number: CN201910572798.XA
Authority: CN
Inventors: 孙茜; 李紫筝; 农革
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2022-05-17
Anticipated expiration: 2039-06-28
Also published as: CN110399515A

Abstract

The embodiment of the invention is suitable for the technical field of picture retrieval, and provides a picture retrieval method, a device and a picture retrieval system, wherein the method comprises the following steps: receiving a retrieval request aiming at a picture, wherein the retrieval request carries information to be retrieved, and the information to be retrieved comprises text information or picture data to be retrieved; if the information to be retrieved is text information, taking a semantic text field in a preset suffix index as a retrieval field of the text information, and retrieving a target picture from the semantic text field by adopting the text information; if the information to be retrieved is picture data, taking a feature domain in the preset suffix index as a retrieval domain of the picture data, and retrieving a target picture from the feature domain by adopting the picture data; the preset suffix index stores semantic texts and characteristic information of a plurality of pictures. The embodiment not only supports the picture retrieval based on the semantic information, but also supports the retrieval according to the picture characteristics.

Description

Picture retrieval method, device and system

Technical Field

The invention belongs to the technical field of picture retrieval, and particularly relates to a picture retrieval method, a picture retrieval device, a server, a computer-readable storage medium and a picture retrieval system.

Background

With the development of internet application technology, pictures become one of the main sources for acquiring information. In the mass information of the internet, the number of pictures also shows explosive growth. How to quickly and accurately search for a desired picture in a large number of pictures is a hot point of research in the field of internet application.

The traditional picture search method is text-based search, and references a keyword search technology commonly used in document search, and the method needs to extract and analyze keywords from text information related to pictures and establish a keyword index. However, this method often results in a low relevance of the search result because the difference between the text information related to the picture and the content of the picture is large. In order to overcome the defects of the text-based search method, a method based on the content features of the picture is developed afterwards, and the method establishes a feature index by extracting the content features of the picture, including the color, texture, shape, spatial relationship and the like of the picture. When searching, a sample image is provided, the content characteristics of the sample image are extracted and analyzed, the content characteristics are compared with the constructed characteristic index, and the image similar to the content characteristics is returned. Although the method can analyze and compare the characteristics of the picture, the method lacks semantic information expressed by the picture, only can retrieve the result of 'shape and similarity', and has higher calculation requirements for indexing and constructing the content characteristics of the picture.

Both of the above two conventional picture retrieval methods have their respective limitations. From the aspect of retrieval form, the former method can only retrieve according to texts, and the latter method needs to provide sample pictures and analyze the content characteristics thereof for retrieval. Because the picture retrieval system in the prior art cannot simultaneously support the retrieval in the two forms, the retrieval efficiency and the retrieval accuracy are low.

Disclosure of Invention

In view of this, embodiments of the present invention provide a picture retrieval method, a picture retrieval device, and a picture retrieval system, so as to solve the problem that in the prior art, no picture retrieval mode based on text and content features is simultaneously supported, so that both the retrieval efficiency and the retrieval accuracy are low.

A first aspect of an embodiment of the present invention provides a picture retrieval method, including:

receiving a retrieval request aiming at a picture, wherein the retrieval request carries information to be retrieved, and the information to be retrieved comprises text information or picture data to be retrieved;

if the information to be retrieved is text information, taking a semantic text field in a preset suffix index as a retrieval field of the text information, and retrieving a target picture from the semantic text field by adopting the text information;

if the information to be retrieved is picture data, taking a feature domain in the preset suffix index as a retrieval domain of the picture data, and retrieving a target picture from the feature domain by adopting the picture data;

the preset suffix index stores semantic texts and characteristic information of a plurality of pictures.

A second aspect of an embodiment of the present invention provides an image retrieval apparatus, including:

the system comprises a receiving module, a searching module and a searching module, wherein the receiving module is used for receiving a searching request aiming at a picture, the searching request carries information to be searched, and the information to be searched comprises text information or picture data to be searched;

the first retrieval domain determining module is used for taking a semantic text domain in a preset suffix index as a retrieval domain of the text information if the information to be retrieved is the text information;

the first retrieval module is used for retrieving a target picture from the semantic text domain by adopting the text information;

a second retrieval domain determining module, configured to, if the information to be retrieved is picture data, use a feature domain in the preset suffix index as a retrieval domain of the picture data;

the second retrieval module is used for retrieving a target picture from the feature domain by adopting the picture data;

A third aspect of embodiments of the present invention provides a server, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the picture retrieval method according to the first aspect when executing the computer program.

A fourth aspect of embodiments of the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the picture retrieval method according to the first aspect.

A fifth aspect of an embodiment of the present invention provides a picture retrieval system, where the system includes a terminal device for interacting with a user, and a server in communication connection with the terminal device. In practice, a user inputs information to be retrieved through terminal equipment, the terminal equipment generates a retrieval request according to the information to be retrieved according to user instructions, the retrieval request is sent to a server, and the server performs targeted retrieval. And when receiving the retrieval request, the server executes the picture retrieval according to the steps of the first aspect.

Compared with the prior art, the embodiment of the invention has the following advantages:

according to the embodiment of the invention, when a retrieval request aiming at the picture is received, whether the current retrieval mode is based on text information or picture data is determined according to the information to be retrieved carried in the retrieval request, and then the retrieval is carried out in a targeted manner. If the image is text information, a semantic text field in a preset suffix index can be used as a retrieval field, and the text information is adopted to retrieve a target image from the semantic text field; in the case of picture data, the feature field in the suffix index may be used as a search field, and the picture data may be used to search for the target picture from the feature field. In the embodiment, the suffix index is further established by analyzing the semantic information of the picture and extracting the content features, and the advantage that the suffix index supports the retrieval of non-natural languages such as text natural language and binary data can be utilized, so that the picture retrieval based on the semantic information is supported, and the retrieval according to the picture features is also supported.

Secondly, the embodiment also provides a similarity sorting method, in the search result based on the semantic text information, the multiple pictures are sorted from high to low according to semantic correlation by calculating semantic similarity, in the result obtained by searching according to the feature information of the picture data, the multiple pictures are sorted from high to low according to the feature vector similarity by calculating the hamming distance among the feature vectors, and the pictures sorted in the front can better match the actual requirements of the user and meet the expectation of the user on the search result.

Thirdly, the present embodiment can provide multi-method, multi-condition, and multi-type picture retrieval by using suffix index, thereby greatly improving the problems of low index construction efficiency and low retrieval efficiency, and ensuring the instantaneity, reliability, and high efficiency of retrieval.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the embodiments or the description of the prior art will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

FIG. 1 is a flowchart illustrating steps of a method for retrieving pictures according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating steps of a method for generating a suffix index according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a process of retrieving a picture based on text information according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating the steps of calculating semantic similarity between a target picture and text information to be retrieved according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a process of retrieving a picture based on picture data according to an embodiment of the present invention;

FIG. 6 is a diagram of an image retrieval apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

The technical solution of the present invention is explained below by specific examples.

Referring to fig. 1, a schematic flow chart illustrating steps of a picture retrieval method according to an embodiment of the present invention is shown, which may specifically include the following steps:

s101, receiving a retrieval request aiming at a picture, wherein the retrieval request carries information to be retrieved, and the information to be retrieved comprises text information or picture data to be retrieved;

it should be noted that the method may be applied to a server, where the server may provide an interface to the outside through a terminal device connected to the server in communication, and a user may input a retrieval request for a picture through the interface. After receiving the retrieval request, the terminal equipment can immediately send the retrieval request to the server, the server completes the corresponding retrieval process, and the picture obtained by the retrieval result is displayed to the user through the interface of the terminal equipment. The terminal device may be a mobile terminal device such as a mobile phone and a tablet computer, or may be a desktop computer device, which is not limited in this embodiment.

In the embodiment of the present invention, the retrieval request input by the user through the terminal device interface may include specific information to be retrieved. According to the characteristics of picture retrieval, the information to be retrieved can be text information, namely a section of specific characters, such as descriptive sentences like 'blue background pictures containing holiday blessings'; the information to be retrieved may also be picture data, i.e. a sample image.

After the user inputs the information to be retrieved, the user can send out a retrieval request by clicking a corresponding retrieval button, and the terminal equipment can forward the retrieval request to the server for processing in real time.

In the embodiment of the invention, the server carries out targeted retrieval in different modes according to different information to be retrieved input by the user.

For example, if the information to be retrieved input by the user is text information, the user may consider that the server desires to perform picture retrieval according to the specific semantics included in the text. At this time, the server may perform step S102 to perform picture retrieval for the text information.

If the user inputs information to be retrieved to ask for picture data, the user may be considered to want the server to retrieve other pictures having similarity with the picture. At this time, the server may execute step S104 to search for a specific sample.

S102, taking a semantic text field in a preset suffix index as a retrieval field of the text information; the preset suffix index stores semantic texts and characteristic information of a plurality of pictures.

Generally, the picture retrieval needs to establish a corresponding index in advance to support the subsequent specific retrieval process. In the embodiment of the present invention, information for text information or picture data may be uniformly stored by constructing a suffix index. In the suffix index, a plurality of different attribute fields may be included, and each attribute field may store different index information. For example, the metadata in the semantic text field may be semantic text describing each picture, and the metadata in the feature field may be specific features of each picture. For example, binary pattern feature LBP, histogram of oriented gradients feature HOG, scale invariant feature SIFT, etc.

For ease of understanding, the present embodiment first briefly describes the process of constructing the suffix index.

As shown in fig. 2, which is a schematic flow chart illustrating steps of a method for generating a suffix index according to an embodiment of the present invention, the method may specifically include the following steps:

s1021, acquiring picture data to be processed;

it should be noted that the picture data to be processed is various pictures whose information needs to be stored. According to the embodiment, various information of the picture can be stored by constructing the suffix index, and the index creation efficiency and the query efficiency are improved.

S1022, performing semantic analysis on the picture data to obtain a semantic text for describing the picture data;

in the embodiment of the present invention, the result obtained by performing semantic analysis on the picture is usually statements describing the content of the picture, the statements may be separately stored in an attribute field of a suffix index in a text form, and the name of the attribute field may be semantic.

The semantic text may be a description of the picture itself, and more than what the picture contains is some description of the relationship between things.

In a specific implementation, the semantic description sentence can be directly generated according to the picture by a deep learning method. For example, referring to an Image capture Generator (Image header Generator) algorithm, which is a typical end-to-end model, the Image can be first converted into a vector representation by taking advantage of the convolutional neural network CNN in the aspect of extracting the high-level features of the Image, and then the Image vector can be converted into a semantic description statement by taking advantage of the recurrent neural network RNN.

S1023, extracting multiple kinds of feature information of the picture data, and converting the multiple kinds of feature information into binary feature vectors;

in the embodiment of the present invention, the plurality of feature information of the picture may include an LBP feature, a HOG feature, a SIFT feature, and the like. Of course, other types of features may be included according to actual needs, and this embodiment does not limit this.

In a specific implementation, the features may be obtained by using corresponding feature extraction algorithms, respectively. For example, LBP feature extraction algorithm, HOG feature extraction algorithm, and SIFT feature extraction algorithm.

After the above feature information is obtained separately, each feature information may be processed into a binary feature vector for storage convenience.

If the feature vector corresponding to any feature information is not a binary vector, the feature vector corresponding to the feature information may be converted into a binary vector. For example, the feature vector [10,08,22] is converted to [01010,010000,10110 ].

S1024, respectively storing the semantic text and the binary characteristic vectors corresponding to the various characteristic information as metadata into a plurality of attribute domains of a suffix index to be constructed;

in the embodiment of the present invention, the semantic text obtained by parsing and the binary feature vector may be stored as metadata in attribute fields named sematiscs, LBP, HOG, and SIFT, respectively.

Of course, the plurality of attribute fields of the suffix index to be constructed may further include an attribute field for storing information such as a picture ID, a picture name, a picture creation date, a picture author, and the like, which is not limited in this embodiment.

As shown in Table one, this is an example of an attribute field of the present embodiment. In this example, a picture I number PID, a picture name picturename, a picture creation date, a picture author, a picture semantic text semantic, a picture content feature LBP, HOG, SIFT, and other attribute fields are included.

Table one, attribute Domain example

PID

picturename

creationdate

author

semantics

LBP

HOG

SIFT

In the above-illustrated attribute fields, corresponding metadata may be stored, respectively. As shown in table two, examples of metadata for each attribute field for two pictures are provided.

TABLE II, metadata examples for Property Domains

PID	picturename	creationdate	author	semantics
					0001	animal	2019-5-20	Bob	A puppy is chasing a butterfly
0002	flower	2019-5-21	Alice	Blooming roses

Watch two (continue)

LBP	HOG	SIFT
			01001…00001	00010…00000	00000…11111
00000…01110	01101…01111	00001…10110

S1025, determining suffix arrays corresponding to the metadata in each attribute domain and a domain information structure of each attribute domain;

in the embodiment of the present invention, each attribute field includes a suffix array and a field information structure, in addition to metadata for recording the specific content of the field corresponding to the picture.

Generally, metadata is a data object to be stored, mainly in the form of a character string. The suffix array records the lexicographically ordered positions of all suffixes of the string. And the domain information structure may be used to obtain metadata of a specified file in the corresponding attribute domain, where the specified file is condition information of the picture to be retrieved.

In the embodiment of the invention, after the metadata of a certain attribute domain is obtained, a preset suffix array construction algorithm can be adopted to construct a suffix array corresponding to the metadata in each attribute domain, and after the domain information structure of each attribute domain is determined, a suffix index is constructed according to the metadata, the suffix array and the domain information structure of each attribute domain.

Taking the picturename field in the attribute field shown in table two as an example, the constructed suffix array may be as shown in table three.

TABLE III suffix array corresponding to picturename field shown in TABLE II

Metadata	animal flower
		Suffix array	6,4,0,11,7,2,5,8,3,1,9,12,10

In this embodiment of the present invention, the domain information structure of each attribute domain may include: the number of picture files stored in the attribute field, the metadata size currentSize in the attribute field, and the file information structure FileInfo of each picture data in the attribute field are recorded. The FileInfo is used to record a file information structure of each picture data in the attribute field, and includes an index deletion marker delete (where 0 is not deleted and 1 is deleted), a metadata size of the file corresponding to the attribute content in the attribute field (currentSize is the sum of the metadata sizes of the current attribute field, size is the metadata size of each picture data in the attribute field, and the sum of the sizes is currentSize), an offset of the first byte of the metadata in the attribute field corresponding to the picture data in the attribute field metadata, and an ID number PID of the picture data.

For the two pictures shown in table two, the domain information structure corresponding to the attribute domain picturename can be as shown in table four.

TABLE IV Domain information Structure corresponding to picturename Domain shown in TABLE II

And S1026, generating suffix indexes based on the attribute domains and the suffix arrays and domain information structures corresponding to the attribute domains.

After the metadata of each attribute domain, and the suffix array and the domain information structure corresponding to the attribute domain are obtained, respectively, a suffix index may be generated based on the attribute domain, the suffix array, and the domain information structure.

As can be seen from the second table, the suffix index can be adopted to process both text data and binary data; and the time complexity of index construction is O (n), and the query time complexity is O (logn), wherein n is the length of the character string of the suffix array index to be constructed. Since the construction speed of the index is linear, the suffix index has obvious advantages compared with the traditional index method.

In the embodiment of the present invention, if the information to be retrieved is text information, the server may first determine, for the text information, that a retrieval domain used in the retrieval process is a semantic text domain in a preset suffix index, that is, a semantic domain.

S103, retrieving a target picture from the semantic text domain by adopting the text information;

in the embodiment of the invention, when picture retrieval is carried out, after a specific retrieval domain is determined according to information to be retrieved, corresponding metadata and a suffix array in the retrieval domain can be inquired, the offset of a matching item in the metadata of the domain is obtained, then, the file information structure FileInfo is inquired according to the offset to obtain a PID corresponding to a picture file, data of each attribute domain in the specified picture file is obtained according to the PID, and a target picture is output.

As shown in fig. 3, which is a schematic diagram of a picture retrieval process based on text information according to an embodiment of the present invention, specifically, the following sub-steps may be included:

s1031, analyzing the text information to obtain a plurality of keywords contained in the text information;

generally, since there is a deviation in semantic description of the same picture, so that an exact match cannot be achieved by directly searching according to text information input by a user, the text in a user search request needs to be parsed before searching. By parsing the semantic description information requested by the user, keywords W1, W2, W3, and the like can be extracted therefrom.

S1032, retrieving in the preset suffix index by adopting each keyword to respectively obtain the picture to be screened corresponding to each keyword;

in the embodiment of the present invention, after the plurality of keywords are obtained through parsing, each keyword may be respectively retrieved in the suffix index in the corresponding speech text field, i.e., the semantic field, so as to obtain a plurality of results respectively.

As shown in table five, the search result is a search result corresponding to each search keyword, and the search result is a picture to be screened, which is obtained by searching according to the extracted keyword.

Table five, examples of keywords and corresponding search results (pictures to be filtered)

Keyword	Search results
		W1	PID_1,PID_3,PID_4,PID_12,PID_33…
W2	PID_2,PID_4,PID_12,PID_22…
		W3	PID_4,PID_5,PID_12,PID_55…

And S1033, taking an intersection from the pictures to be screened corresponding to the keywords, and outputting a target picture matched with the text information.

In the embodiment of the invention, after the retrieval results corresponding to the keywords are obtained, the intersection of the retrieval results of all the keywords can be taken as the final retrieval result.

In the above example, the final retrieval results are PID _4 and PID _12, and the pictures corresponding to these two PID numbers are the target pictures obtained by the user performing picture retrieval by using the text information.

When the specified retrieval domain is semantic, multiple matching retrieval results are obtained due to word similarity and semantic similarity, that is, the target picture may include multiple pieces. Therefore, before outputting the target picture, a method for measuring the similarity between semantic information and sorting the search results is also needed.

In the embodiment of the invention, the semantic similarity between the plurality of target pictures and the text information input by the user can be respectively calculated, and then the plurality of target pictures are sequenced according to the semantic similarity. Generally, it can be considered that the top ranked pictures based on semantic similarity are more matched with the text information.

As shown in fig. 4, which is a schematic flow chart illustrating a step of calculating semantic similarity between a target picture and text information to be retrieved according to an embodiment of the present invention, specifically, the step may include the following steps:

s331, respectively acquiring semantic texts stored in the preset suffix indexes of the target pictures, wherein the semantic texts respectively comprise a plurality of target keywords;

in the embodiment of the invention, in order to compare the semantic similarity between the target picture and the text information to be retrieved input by the user, the semantic text of the picture can be extracted from the semantic text field semantic corresponding to the target picture in the suffix index. The semantic text may be parsed into a plurality of target keywords.

S332, aiming at any target picture, respectively extracting target keywords contained in the semantic text and non-repeated words in the keywords contained in the text information to form a combined word set;

in the embodiments of the present invention. The union set of words may consist of non-repeating phrases in the two groups of words.

For example, assume that the text information the user requests to retrieve is T₁，T₁By

A plurality of words (key words), and the semantic text corresponding to the searched target picture is T₂，T₂By

The words (target keywords) are composed of T₁And T₂Word set composed of non-repeating words { w }₁,w₂,w₃,…w_mIs a union word set, wherein m₁+m₂≤m。

S333, respectively calculating word vector cosine similarity and word sequence similarity between the semantic text corresponding to the target picture and the text information according to the combined word set;

in the embodiment of the invention, in order to compare the semantic similarity between the semantic text corresponding to the target picture and the text information to be retrieved input by the user, the cosine similarity of the word vector between the two texts can be compared at first.

Because the union word set is derived from the comparison sentence, the original comparison sentence can generate word vectors according to the union word set and define

In a specific implementation, word vectors of semantic texts corresponding to the target picture and word vectors of the text information may be respectively determined according to whether keywords in the text information or target keywords in the semantic texts corresponding to the target picture exist in the joint word set. By T₁For example, the following steps are carried out:

if w_iAt T₁In the above-mentioned order of magnitude,

is set to 1; if w_iIs not at T₁If m is present, then calculate w_iAnd T₁The semantic similarity of each word in the word list is used as the most similar word in the calculation results, and if the calculation results exceed a preset similarity threshold value, the word list is divided into a plurality of words

Set to 1, otherwise set to 0.

Typically, in WordNet (a dictionary based on cognitive linguistics), individual words are organized into sets of synonyms (syncuts), with semantic and relational pointers to other syncuts, all the individual words are classified, and a semantic tree is formed. In a semantic tree, it is clear that words at the upper level of the hierarchy have more general semantics with less similarity between them, while words at the lower level of the hierarchy have more specific semantics and more similarity. Therefore, according to the depth of the single word in the hierarchical structure and the path length between the single words, a semantic similarity formula of the word is provided:

wherein l is the shortest path between two words, and h is the depth of the hierarchical structure. α ∈ [0,1], β ∈ [0,1] are parameters that scale the length and depth of the shortest path contribution.

The text information T is obtained by the method₁Semantic text T corresponding to target picture₂Word vector s₁And s₂Then, the above-mentioned word vector s may be employed₁And s₂Calculating semantic text T corresponding to the target picture₂And text information T input by user₁The cosine similarity of the word vectors between them.

In a specific implementation, the cosine similarity of the word vector between the two can be calculated by the following formula:

generally, sentences containing the same words but different word orders often have great difference in semantics, so that the embodiment can compare the similarity of the word orders of the sentences except for measuring the similarity of the word meanings of the sentences.

In the embodiment of the present invention, the word order vector of the semantic text corresponding to the target picture and the word order vector of the text information may be respectively determined according to whether the order of the keyword or the target keyword is the same as the order of the corresponding keyword in the union word set.

Suppose that:

T₁：A man sitting on a bench with a dog.

T₂：A man is lying on a bench with a dog.

then, the union set T ═ a, man, sitting, is, lying, on, bench, with, dog },

T₁and T₂The corresponding index numbers are shown in table six.

Watch six

T₁	a	man	sitting	on	a	bench	with	a	dog
											Index number	1	2	3	4	5	6	7	8	9
T₂	a	man	is	lying	on	a	bench	with	a	dog
											Index number	1	2	3	4	5	6	7	8	9	10

Judging words in T one by one, if the current word is the same as T₁If the words are the same, then use T directly₁Index number of word in r₁Marks the word if the current word is at T₁If it does not occur, find T₁And calculating the similarity of the words with the most similar semanteme to the current word. If the similarity is larger than the preset similarity threshold, marking the current list by the index number of the similar wordA word. If the above conditions are not satisfied, then the current word is at r₁Is 0.

For T₁And T₂Index number marking operation is respectively carried out, and the obtained word sequence vectors are respectively as follows:

r₁＝{1,2,3,0,0,4,6,7,9}

r₂＝{1,2,0,3,4,5,7,8,10}

in obtaining the text information T₁Word order vector r₁Semantic text T corresponding to target picture₂Word order vector r₂Then, the word order vector r described above may be employed₁And r₂And calculating the word sequence similarity between the semantic text corresponding to the target picture and the text information to be retrieved input by the user.

In a specific implementation, the following formula can be used to calculate the word order similarity between the two:

and S334, according to a preset weight, carrying out weighted summation on the cosine similarity and the word sequence similarity of the word vector, and obtaining the semantic similarity between the target picture and the text information.

In the embodiment of the invention, the cosine similarity of the word vector and the word order similarity of the sentences are synthesized, so that the overall similarity of the sentences can be obtained, namely the calculation result of the semantic information correlation.

Since semantic information of sentences depends on vocabulary and word order, the measurement of semantic relevance between sentences needs to comprehensively consider the vocabulary similarity and the word order similarity. The calculation formula is as follows:

S(T₁,T₂)＝δS_s+(1-δ)S_r

the size of δ determines the influence of the occupancy of lexical and lexical information on overall similarity. Generally speaking, the similarity of words (cosine similarity of word vector) plays a major role in semantic similarity, so δ generally takes a value greater than 0.5.

S104, taking a feature domain in the preset suffix index as a retrieval domain of the picture data;

in the embodiment of the present invention, if the information to be retrieved is picture data, the server may first determine, for the sample graph, that the retrieval domain used in the retrieval process is a feature domain in a preset suffix index, that is, multiple attribute domains such as an LBP domain, an HOG domain, and a SIFT domain.

And S105, retrieving a target picture from the feature domain by using the picture data.

Similar to the search based on text information, after determining a specific search domain, the server may perform a corresponding search in the corresponding search domain for the sample.

As shown in fig. 5, which is a schematic diagram of a picture retrieval process based on picture data according to an embodiment of the present invention, specifically, the following sub-steps may be included:

s1051, extracting various characteristic information of the picture data;

it should be noted that, since the metadata in the plurality of attribute fields, such as the LBP field, the HOG field, and the SIFT field, is obtained by extracting features of each picture and converting the extracted feature information into a binary vector. Therefore, when a sample graph is used to perform a corresponding feature search, the sample graph needs to be processed first to extract a plurality of corresponding features.

S1052, searching in the preset suffix index by using various feature information of the picture data to respectively obtain a plurality of pictures to be screened corresponding to the various feature information of the picture data;

since the binary feature vector can be regarded as a form of a character string, the method for retrieving in the corresponding LBP domain, HOG domain, and SIFT domain by using the feature information in this step is similar to the method for retrieving by using the keyword in the foregoing step S102, and this step is not described again.

In the embodiment of the invention, after sequentially searching in the suffix index according to the search domains LBP, HOG and SIFT, a plurality of matched PIDs, namely a plurality of pictures to be screened, can be obtained.

S1053, respectively calculating the feature similarity of the multiple pictures to be screened and the picture data;

in the embodiment of the invention, in order to further analyze the matching degree between the retrieved multiple images to be screened and the original sample image, the feature similarity between each image to be screened and the sample image can be calculated and calculated.

In a specific implementation, the hamming distances between the multiple images to be screened in the retrieval result and the feature vectors of the sample images can be calculated to determine the feature similarity.

Because the feature vectors are all binary representations, the Hamming distance can be obtained only by carrying out XOR operation on the vectors.

For example, if the eigenvector 1 is (10110) and the eigenvector 2 is (00110), xor is performed according to bits, the same is 0, and the different is 1, and then the results on the bits are added to obtain the hamming distance.

And (4) synthesizing the distance results of the feature vectors under different feature attribute domains to obtain the final feature similarity.

And S1054, extracting the picture to be screened with the characteristic similarity exceeding a preset numerical value as a target picture.

In the embodiment of the invention, after the feature similarity between each picture to be screened and the sample picture is obtained through calculation, each picture to be screened can be sequenced according to the feature similarity.

Because the larger the feature similarity of the previously ranked pictures is, the higher the similarity or matching degree of the pictures with the sample pictures is, the picture to be screened with the feature similarity exceeding the preset value can be extracted as the final output target picture. The target picture can be fed back to the terminal equipment by the server and displayed to the user through an interface of the terminal equipment.

In the embodiment of the invention, when a retrieval request for the picture is received, whether the current retrieval mode is based on text information or picture data is determined according to the information to be retrieved carried in the retrieval request, and then the retrieval is carried out in a targeted manner. If the image is text information, a semantic text field in a preset suffix index can be used as a retrieval field, and the text information is adopted to retrieve a target image from the semantic text field; in the case of picture data, the feature field in the suffix index may be used as a search field, and the picture data may be used to search for the target picture from the feature field. In the embodiment, the suffix index is further established by analyzing the semantic information of the picture and extracting the content features, and the advantage that the suffix index supports the retrieval of non-natural languages such as text natural language and binary data can be utilized, so that the picture retrieval based on the semantic information is supported, and the retrieval according to the picture features is also supported.

Compared with the prior art, the embodiment can provide multi-method, multi-condition and multi-type picture retrieval by adopting suffix index, greatly improve the problems of low index construction efficiency and low retrieval efficiency, and ensure the instantaneity, reliability and high efficiency of retrieval.

It should be noted that, the sequence numbers of the steps in the foregoing embodiments do not mean the execution sequence, and the execution sequence of each process should be determined by the function and the internal logic of the process, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

Referring to fig. 6, a schematic diagram of an image retrieval apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:

a receiving module 601, configured to receive a retrieval request for a picture, where the retrieval request carries information to be retrieved, and the information to be retrieved includes text information or picture data to be retrieved;

a first retrieval domain determining module 602, configured to, if the information to be retrieved is text information, use a semantic text domain in a preset suffix index as a retrieval domain of the text information;

a first retrieving module 603, configured to retrieve a target picture from the semantic text domain by using the text information;

a second retrieval domain determining module 604, configured to, if the information to be retrieved is picture data, use a feature domain in the preset suffix index as a retrieval domain of the picture data;

a second retrieving module 605, configured to retrieve a target picture from the feature domain by using the picture data;

In this embodiment of the present invention, the preset suffix index may be generated by invoking the following modules:

the image data acquisition module is used for acquiring image data to be processed;

the semantic analysis module is used for performing semantic analysis on the picture data to obtain a semantic text for describing the picture data;

the binary characteristic vector conversion module is used for extracting various characteristic information of the picture data and converting the various characteristic information into binary characteristic vectors;

the metadata storage module is used for respectively storing the semantic text and the binary characteristic vectors corresponding to the various kinds of characteristic information as metadata into a plurality of attribute domains of a suffix index to be constructed;

a suffix array and domain information structure determining module, configured to determine a suffix array corresponding to metadata in each attribute domain and a domain information structure of each attribute domain;

and the suffix index generating module is used for generating a suffix index based on the plurality of attribute domains and the suffix arrays and domain information structures corresponding to the attribute domains.

In this embodiment of the present invention, the first retrieving module 603 may specifically include the following sub-modules:

the text information analysis sub-module is used for analyzing the text information to obtain a plurality of keywords contained in the text information;

the keyword retrieval submodule is used for retrieving in the preset suffix index by adopting each keyword to respectively obtain the pictures to be screened corresponding to each keyword;

and the target picture output sub-module is used for taking intersection of the pictures to be screened corresponding to the keywords and outputting the target picture matched with the text information.

In this embodiment of the present invention, the target picture may include a plurality of pictures, and the apparatus may further include:

the semantic similarity calculation module is used for calculating semantic similarities between the plurality of target pictures and the text information respectively;

and the target picture ordering module is used for ordering the plurality of target pictures according to the semantic similarity.

In the embodiment of the present invention, the semantic similarity calculation module may specifically include the following sub-modules:

a semantic text acquisition submodule, configured to acquire semantic texts stored in the preset suffix indexes of the multiple target pictures, where the semantic texts include multiple target keywords respectively;

a joint word set generation submodule, configured to, for any target picture, respectively extract non-repetitive words in target keywords included in the semantic text and keywords included in the text information, and form a joint word set;

the similarity calculation operator module is used for respectively calculating word vector cosine similarity and word sequence similarity between the semantic text corresponding to the target picture and the text information according to the combined word set;

and the similarity weighting submodule is used for weighting and summing the cosine similarity and the word sequence similarity of the word vector according to a preset weight value to obtain the semantic similarity between the target picture and the text information.

In the embodiment of the present invention, the similarity operator module may specifically include the following units:

a word vector determining unit, configured to determine, according to whether the keyword or the target keyword exists in the joint word set, a word vector of a semantic text corresponding to the target picture and a word vector of the text information, respectively;

the word vector cosine similarity calculation unit is used for calculating the word vector cosine similarity between the semantic text corresponding to the target picture and the text information by adopting the word vector of the semantic text corresponding to the target picture and the word vector of the text information;

a word sequence vector determining unit, configured to determine, according to whether the order of the keyword or the target keyword is the same as the order of the corresponding keyword in the joint word set, a word sequence vector of a semantic text corresponding to the target picture and a word sequence vector of the text information, respectively;

and the word order similarity calculation unit is used for calculating the word order similarity between the semantic text corresponding to the target picture and the text information by adopting the word order vector of the semantic text corresponding to the target picture and the word order vector of the text information.

In this embodiment of the present invention, the second retrieving module 605 may specifically include the following sub-modules:

the characteristic information extraction submodule is used for extracting various kinds of characteristic information of the picture data;

the characteristic information retrieval submodule is used for retrieving in the preset suffix index by adopting various characteristic information of the picture data to respectively obtain a plurality of pictures to be screened corresponding to the various characteristic information of the picture data;

the characteristic similarity calculation operator module is used for calculating the characteristic similarity of the plurality of pictures to be screened and the picture data respectively;

and the target picture extraction submodule is used for extracting the picture to be screened with the characteristic similarity exceeding a preset numerical value as a target picture.

For the apparatus embodiment, since it is substantially similar to the method embodiment, it is described relatively simply, and reference may be made to the description of the method embodiment section for relevant points.

Referring to fig. 7, a schematic diagram of a server of one embodiment of the invention is shown. As shown in fig. 7, the server 700 of the present embodiment includes: a processor 710, a memory 720, and a computer program 721 stored in said memory 720 and operable on said processor 710. The processor 710, when executing the computer program 721, implements the steps in the various embodiments of the picture retrieval method described above, such as the steps S101 to S105 shown in fig. 1. Alternatively, the processor 710, when executing the computer program 721, implements the functions of each module/unit in each device embodiment described above, for example, the functions of the modules 601 to 605 shown in fig. 6.

Illustratively, the computer program 721 may be divided into one or more modules/units, which are stored in the memory 720 and executed by the processor 710 to implement the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing certain functions, which may be used to describe the execution of the computer program 721 in the server 700. For example, the computer program 721 may be divided into a receiving module, a first search domain determining module, a first search module, a second search domain determining module, and a second search module, and each module has the following specific functions:

The server 700 may be a computing device such as a desktop computer, a notebook, a palm computer, and a cloud server. The server 700 may include, but is not limited to, a processor 710, a memory 720. Those skilled in the art will appreciate that fig. 7 is merely an example of a server 700 and does not constitute a limitation on server 700 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., server 700 may also include input-output devices, network access devices, buses, etc.

The Processor 710 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 720 may be an internal storage unit of the server 700, such as a hard disk or a memory of the server 700. The memory 720 may also be an external storage device of the server 700, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the server 700. Further, the memory 720 may also include both an internal storage unit and an external storage device of the server 700. The memory 720 is used for storing the computer program 721 and other programs and data required by the server 700. The memory 720 may also be used to temporarily store data that has been output or is to be output.

The embodiment also provides a picture retrieval system, which comprises a terminal device used for interacting with a user and a server in communication connection with the terminal device. In practice, a user can input information to be retrieved through the terminal device, the terminal device can generate a retrieval request according to the information to be retrieved according to the instruction of the user, the retrieval request is sent to the server, and the server performs targeted retrieval.

The server in this embodiment may include a memory, a processor, and a computer program that is stored in the memory and is executable on the processor, and the processor may implement the steps of the image retrieval method in the foregoing embodiment when executing the computer program, which is not described in detail in this embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same. Although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein.

Claims

1. An image retrieval method, comprising:

the preset suffix index stores semantic texts and characteristic information of a plurality of pictures;

the preset suffix index is generated by the following steps:

acquiring picture data to be processed;

semantic analysis is carried out on the picture data to obtain semantic texts used for describing objects contained in the pictures in the picture data and relations among the objects;

extracting various kinds of characteristic information of the picture data, and converting the various kinds of characteristic information into binary characteristic vectors;

respectively storing the semantic text and the binary characteristic vectors corresponding to the various characteristic information as metadata into a plurality of attribute fields of a suffix index to be constructed;

determining suffix arrays corresponding to metadata in each attribute domain and domain information structures of the attribute domains;

generating a suffix index based on the plurality of attribute fields and their corresponding suffix arrays and field information structures.

2. The method of claim 1, wherein the step of using the text information to retrieve the target picture from the semantic text field comprises:

analyzing the text information to obtain a plurality of keywords contained in the text information;

searching in the preset suffix index by adopting each keyword to respectively obtain the picture to be screened corresponding to each keyword;

and taking intersection of the pictures to be screened corresponding to the keywords, and outputting the target picture matched with the text information.

3. The method of claim 2, wherein the target picture comprises a plurality of pictures, the method further comprising:

respectively calculating semantic similarity between a plurality of target pictures and the text information;

and sequencing the plurality of target pictures according to the semantic similarity.

4. The method according to claim 3, wherein the step of calculating semantic similarities between the target pictures and the text information respectively comprises:

respectively acquiring semantic texts stored in the preset suffix indexes of the target pictures, wherein the semantic texts respectively comprise a plurality of target keywords;

aiming at any target picture, respectively extracting non-repeated words in target keywords contained in the semantic text and keywords contained in the text information to form a combined word set;

respectively calculating word vector cosine similarity and word sequence similarity between the semantic text corresponding to the target picture and the text information according to the combined word set;

and according to a preset weight, carrying out weighted summation on the cosine similarity and the word order similarity of the word vector to obtain the semantic similarity between the target picture and the text information.

5. The method according to claim 4, wherein the step of calculating the cosine similarity of word vectors and the similarity of word order between the semantic text and the text information corresponding to the target picture according to the joint word set comprises:

respectively determining word vectors of semantic texts corresponding to the target pictures and word vectors of the text information according to whether the keywords or the target keywords exist in the combined word set;

calculating the cosine similarity of the word vector between the semantic text corresponding to the target picture and the text information by adopting the word vector of the semantic text corresponding to the target picture and the word vector of the text information;

respectively determining word sequence vectors of semantic texts corresponding to the target pictures and word sequence vectors of the text information according to whether the sequence of the keywords or the target keywords is the same as the sequence of the corresponding keywords in the joint word set;

and calculating the word sequence similarity between the semantic text corresponding to the target picture and the text information by adopting the word sequence vector of the semantic text corresponding to the target picture and the word sequence vector of the text information.

6. The method of claim 1, wherein the step of using the picture data to retrieve the target picture from the feature field comprises:

extracting various characteristic information of the picture data;

searching in the preset suffix index by adopting various characteristic information of the picture data to respectively obtain a plurality of pictures to be screened corresponding to the various characteristic information of the picture data;

respectively calculating the feature similarity of the plurality of pictures to be screened and the picture data;

and extracting the picture to be screened with the characteristic similarity exceeding a preset numerical value as a target picture.

7. An image retrieval apparatus, comprising:

the preset suffix index may be generated by invoking the following modules:

the semantic analysis module is used for performing semantic analysis on the picture data to obtain semantic texts for describing objects contained in the pictures in the picture data and relations among the objects;

8. A server comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the picture retrieval method according to any one of claims 1 to 6 when executing the computer program.

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the picture retrieval method according to any one of claims 1 to 6.