CN102147815A

CN102147815A - Method and system for searching images

Info

Publication number: CN102147815A
Application number: CN2011101004858A
Authority: CN
Inventors: 段凌宇; 纪荣嵘; 陈杰; 李冰; 黄铁军; 姚鸿勋; 高文
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2011-04-21
Filing date: 2011-04-21
Publication date: 2011-08-10
Anticipated expiration: 2031-04-21
Also published as: CN102147815B

Abstract

The invention provides a method and system for searching images. The method comprises: a client receives a content to be searched which comprises a target image to be searched or the target image to be searched and relevant information thereof; the client obtains visual words of the target image and selects at least one target visual word dictionary corresponding to the content to be searched from more than one visual word dictionary according to preset rules and obtains the target visual word of the visual words according to the target visual word dictionary; and the target visual word is encoded and then transmitted to a server to obtain upshots matched with the searched content and/or relevant information of the upshots. The method provided by the invention can be used for improving the image search speed by reducing data uploaded by the client while shortening waiting time of users and improving the search accuracy of the search system.

Description

Picture searching method and picture searching system

Technical Field

The invention relates to the technical field of picture identification and search, in particular to a picture search method and a picture search system.

Background

With the rapid development of wireless networks and the continuous enhancement of functions of mobile devices, users frequently inquire picture information by using the mobile devices. The earliest appeared is to use text to describe the contents of pictures, and then follow-up retrieval/search is performed according to the text contents. However, the text cannot accurately describe the content of the picture, and the search result of the text search picture is often not the information required by the user, so that the text search mode cannot be satisfied by the user.

Another content-based image searching method is a searching method aiming at searching similar images by using images as queries, and can avoid the problem of inaccurate text description brought by text searching images. However, the content-based picture search method directly transmits an image to a server, thereby generating a large data transmission amount. In particular, in a wireless network environment with limited and unstable bandwidth, a picture search often requires a long query response time.

Therefore, the industry describes the picture through the visual descriptor, converts the picture into a one-dimensional vector consisting of a plurality of data, and changes the transmission of the picture to the server into the transmission of the data vector to the server. The description mode of the visual descriptor for the picture can improve the query response time of the picture, but is limited by the quality of the current mobile network, and the uploading speed still cannot meet the actual requirements of users. In view of this, how to provide a picture retrieval method that can ensure the picture retrieval performance and efficiency and reduce the requirement for bandwidth in picture retrieval is a technical problem that needs to be solved currently.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a picture searching method and a picture searching system, which improve the picture retrieval speed, shorten the waiting time of a user and improve the searching accuracy of the searching system by reducing the data volume uploaded by a client under the condition of not reducing the searching performance.

The picture searching method provided by the invention comprises the following steps:

the client receives query content, wherein the query content comprises a target picture to be queried or the target picture to be queried and related information;

the client acquires visual words of a target picture, selects at least one target visual word dictionary corresponding to the query content from more than one visual word dictionary of the client according to a preset rule, and acquires the target visual words of the visual words according to the target visual word dictionary;

and coding the target visual words and then sending the coded target visual words to a server so as to obtain result pictures matched with the query contents and/or relevant information of the result pictures.

According to another aspect of the present invention, the present invention also provides an image searching method, which includes:

the server receives the encoded target visual words and decodes the target visual words;

the server searches an index table corresponding to a visual word dictionary in the server on the basis of the target visual word to obtain a result picture and/or related information of the result picture, and sends the result picture and/or related information of the result picture to the client;

the visual word dictionary is: and the visual word dictionary is established by adopting a clustering mode for the visual features of all the pictures in the server side picture database.

According to another aspect of the present invention, the present invention also provides an image search system, which includes:

the client receives query contents comprising a target picture to be queried or the target picture to be queried and related information;

the target visual word acquisition module is used for acquiring the visual words of the target picture by the client, selecting at least one target visual word dictionary corresponding to the query content from more than one visual word dictionary of the client according to a preset rule, and acquiring the target visual words of the visual words according to the target visual word dictionary;

the target visual word sending module is used for coding the target visual word and sending the coded target visual word to the server,

the receiving and searching module is used for receiving and decoding the coded target visual words by the server side, and searching the index table corresponding to the visual word dictionary of all pictures in the database based on the target visual words to obtain the result pictures and/or the related information of the result pictures;

and the server side sends the result picture and/or the related information of the result picture to the client side.

The picture searching method and the picture searching system provided by the invention mainly compress the target picture into the target visual word with the visual content description capacity at the client and transmit the target visual word to the server, so that low-bit data transmission between the client and the server is realized, the waiting time of a user in inquiring the target picture is shortened, the response time of the server in the system is improved, and the inquiring efficiency in the picture searching method is further improved.

Furthermore, the searching method can also improve the accuracy of the searching result. The method can be popularized and applied to retrieval/search of various pictures, and can acquire the extension information of the result picture, so that the method is wide in application range, applicable to various fields and convenient for a user to retrieve various information.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flowchart illustrating steps of an embodiment of a method for searching pictures according to the present invention;

FIG. 2 is a flow chart of the steps for screening a valid visual dictionary in the present invention;

FIG. 3 is a flowchart illustrating steps of an embodiment of a method for searching pictures according to the present invention;

fig. 4 is a schematic structural diagram of an embodiment of the image search system in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention mainly provides a picture searching method, which mainly utilizes a visual word dictionary library preset in advance by a client to obtain target visual words aiming at a target picture with less transmission data volume, and then codes the target visual words and sends the coded target visual words to a server, and a result picture and/or related extended information are obtained from the server. The method effectively reduces the number of visual words describing the target picture, reduces the data volume transmitted to the server, achieves low-bit data transmission between the client and the server, can effectively solve the problem of long data transmission time under the current bandwidth limitation, can reduce the response time of the server, and further can better save the waiting time of a user.

Of note in the following description is:

visual words: the visual characteristic space is divided discretely, each word is divided, and the picture content is described by using the picture characteristics, which are the most basic data characteristics;

visual word dictionary: a set of visual words of all or selected parts of the pictures in the picture database.

Referring to fig. 1, fig. 1 is a flowchart illustrating steps of an embodiment of a picture searching method according to the present invention, where the steps include:

step 101, a client receives query content, wherein the query content comprises a target picture/query picture to be queried or the target picture and related information to be queried; the relevant information at this point is information other than the target picture. For example, the related information may be text information describing the target picture, geographical location information, publisher barcode, publisher logo or electronic tag, and the like.

102, a client acquires visual words of a target picture, selects at least one target visual word dictionary corresponding to query contents from more than one visual word dictionary of the client according to a preset rule, and acquires the target visual words of the visual words according to the target visual word dictionary;

the generation mode of the visual words of the target picture can be that more than one visual feature of the target picture is obtained, and the features are converted into the visual words in the original visual word dictionary according to the mapping rules of the visual features and the visual words. Preferably, the original visual word dictionary of the client is the same as that of the server, and the original visual word dictionary of the client can be preset in the client in advance and can be updated from the server in real time.

The generation mode of the original visual word dictionary can be that more than one visual characteristic of the database picture of the server side is obtained, and a plurality of classes are generated by adopting a clustering mode for the visual characteristics of the database picture. The specific way of this and subsequent Clustering methods may be K-means Clustering, hierarchical Clustering, Spectral Clustering, etc., where the Spectral Clustering may be described in references "Ng a., Jordan m., and Weiss y.on Spectral Clustering: NIPS, 849-. The class center of each class represents the class and is called visual word, namely each class is a visual word, and the visual word set of the whole database forms an original visual word dictionary.

Specifically, in this embodiment, the visual features such as a color histogram, a texture map, a scale invariant descriptor, a gradient position orientation histogram, or a directional gradient histogram of the target picture may be extracted;

then, according to the mapping rule of the visual features and the visual words, converting the color histogram, the texture map, the scale invariant descriptor (SIFT), the gradient position orientation histogram (GLOH) or the directional gradient Histogram (HOG) of the target picture into all the visual words corresponding to the original visual word dictionary of the service end.

And a substep 1021, searching the visual word dictionary library and the prediction loss function matching the type of the query content from one or more visual word dictionary libraries preset in advance by the client according to the type of the query content. That is, the query content type and visual word dictionary mapping rules are preset. For example, the query content is a picture and text information describing the picture, and the visual word dictionary library is a visual word dictionary library corresponding to the text information.

Specifically, one or more visual word dictionary libraries preset by the client are obtained by the client from the server in advance, and the client updates the one or more visual word dictionary libraries regularly. Or when new pictures are added at the server, the client can be prompted to update the visual word dictionary library inside the server. The following steps P1 to P3 are used to describe in detail how the server obtains the visual word dictionary database.

And a substep 1022 of calculating a prediction loss value of each visual word dictionary in the visual word dictionary database for the visual word of the target picture by using a prediction loss function, and acquiring one or more visual word dictionaries within a threshold range.

The prediction loss function is adopted to calculate the prediction loss value of each visual word dictionary in the visual word dictionary base on the visual word of the target picture, and the specific calculation mode of the prediction loss value can be selected from any one of the following first calculation mode to the third calculation mode.

The first calculation method: the cosine distance between the visual words of the target picture and the class center of the picture class where the target visual word dictionary is located; or

The second calculation method: the cosine distance between the visual words of the target picture and the class center of the picture class in which the target visual word dictionary is positioned, and the weighted sum of the Euclidean distance between the related information and the similar information of the picture class in which the visual word dictionary is positioned;

the third calculation method: the visual similarity distance of the target picture and the picture class where the visual word dictionary of the target visual word dictionary is located, and the product of the Euclidean distance of the related information and the same kind information of the picture class where the visual word dictionary is located.

For example, the prediction loss function f_Prediction(q_i，C_j) The formula of (1) is:

f_prediction(q_i，C_j)＝α·Vd_ij+β·Rd_ij

f_Prediction(q_i，C_j) Representing a target picture q_iClass C of pictures with visual word dictionary_jPredicted loss value of, Vd_ijIs the cosine distance, Rd, of the class center of the picture class in which the visual word of the target picture and the target visual word dictionary are located_ijThe Euclidean distance of the related information and the same kind of information of the picture class where the visual word dictionary is located. And alpha and beta are real numbers and can be set according to experience or requirements.

Cosine distance Vd between visual words of target picture and class center of picture class in which target visual word dictionary is located_ijThe calculation formula is as follows,

<math><mrow><msub><mi>Vd</mi><mi>ij</mi></msub><mo>=</mo><msub><mrow><mo>|</mo><mo>|</mo><mover><msub><mi>BOW</mi><mi>i</mi></msub><mo>&RightArrow;</mo></mover><mo>,</mo><mover><msub><mi>BOW</mi><mi>j</mi></msub><mo>&RightArrow;</mo></mover><mo>|</mo><mo>|</mo></mrow><mrow><mi>Co</mi><mi>sin</mi><mi>e</mi></mrow></msub><mo>=</mo><mfrac><mrow><mover><msub><mi>BOW</mi><mi>i</mi></msub><mo>&RightArrow;</mo></mover><mo>·</mo><mover><msub><mi>BOW</mi><mi>j</mi></msub><mo>&RightArrow;</mo></mover></mrow><mrow><mo>|</mo><mo>|</mo><mover><msub><mi>BOW</mi><mi>i</mi></msub><mo>&RightArrow;</mo></mover><mo>|</mo><mo>|</mo><mo>·</mo><mo>|</mo><mo>|</mo><mover><msub><mi>BOW</mi><mi>j</mi></msub><mo>&RightArrow;</mo></mover><mo>|</mo><mo>|</mo></mrow></mfrac><mo>;</mo></mrow></math>

picture i is the visual word of the target picture,

picture class C of target visual word dictionary_jClass center of (1).

Euclidean distance Rd of related information and similar information of picture class in which visual word dictionary is located_ijIs calculated by the formula

{Rd}_{ij} = {| | R_{i}, R_{j} | |}_{Co \sin e} = \sqrt{{(R_{i} - R_{j})}^{2}}

R_iFor picture i as related information in the query content, R_jPicture class C of target visual word dictionary_jThe same kind of information value.

In addition, the type of the content queried in the above sub-step 1021 may include: the method comprises the steps of detecting a target picture, a target picture and text, a target picture and a signal detected by a sensor, and identifying an object label in the picture by target picture and object identification software. The signal detected by the sensor may include geographical location information detected by a Global Positioning System (GPS) device, barcode information of a book or a commodity scanned by a barcode scanner, electronic tag information (RFID) read by an electronic tag reader, and the like. The object tags recognized by the object recognition software may include recognizing human faces with face recognition software, recognizing text with a text recognition system software (ORC), and the like.

For example, when the type of the query content is the target picture class, the visual word dictionary library is a visual word dictionary library of visual similarity established according to picture similarity.

When the type of the query content is the target picture and the signal class detected by the sensor, if the query content is the landmark picture, and the signal detected by the sensor may be a building in the landmark picture, geographical location information corresponding to the building, or geographical location information corresponding to a natural landscape in the landmark picture. In this case, the visual word dictionary library is a visual word dictionary library corresponding to the geographical location information.

When the type of the query content is the target picture and the object identification software identifies the object tag class in the picture, if the query content is a book picture, the object identification software identifies that the object tag in the picture can be a publisher logo or a name of the book in the book picture. In this case, the visual word dictionary library is a visual word dictionary library corresponding to a publisher logo or name.

The inquiry content is a picture of a commodity, the object recognition software recognizes that an object label in the picture can be a trademark of the commodity, or a bar code scanner scans a bar code of the corresponding commodity (real object) in the picture, and the visual word dictionary library is a visual word dictionary library corresponding to the trademark or the bar code.

The query content is a guide indication picture of a museum exhibition room, the object identification software identifies that the object label in the picture is a bar code or an electronic label in the guide indication picture, and the visual word dictionary library is a visual word dictionary library corresponding to the bar code or the electronic label. In the step, the picture set is divided into a plurality of classes, so that the coupled visual words of the divided picture set are maximum, and the purpose of reducing the dimensionality of the visual word dictionary is achieved.

And 103, coding the target visual words and then sending the coded target visual words to a server so as to obtain and display a result picture matched with the query content and/or related information of the result picture.

In the foregoing sub-step 1021, when one or more visual word dictionary libraries preset by the client are obtained from the server in advance for the client, the step of the server establishing one or more visual word dictionary libraries in advance includes:

first step P1: and dividing the pictures in the server database into picture sets of various types by adopting a picture set dividing mode.

The sub-step of the first step P1 is to divide all pictures into multiple picture sets by using visual similarity between pictures. Alternatively, the sub-step of the first step P1 is to divide all pictures into a plurality of picture sets using picture related information such as the date of picture taking, text labels, electronic labels, etc. Of course, the sub-step of the first step P1 may also be the division of all pictures into sets using visual similarities between the pictures and the date of the picture taking, text labels, electronic labels, etc. of the information related to the pictures.

Second step P2: and establishing a visual word dictionary corresponding to each picture set, and analyzing the visual word dictionary corresponding to each picture. In particular. The visual word dictionary can be an original visual word dictionary of the picture established by the visual characteristics of the picture set in a clustering way; or, the visual word dictionary here is: the visual word dictionary of the picture is established by adopting a clustering mode for the visual features of the picture set, the effective visual word dictionary representing the original visual word dictionary is determined based on the screening rule of the effective visual word dictionary, the effective visual word dictionary is used as the visual word dictionary, and the dimension (the dimension in the N-axis coordinate system) of the visual word dictionary is further relatively reduced.

Third step P3: (first means for obtaining visual word dictionary library) if the visual word dictionary satisfies the visual word dictionary library establishment condition, the set of visual word dictionaries corresponding to each type of picture set forms a visual word dictionary library.

Wherein: the visual word dictionary base establishment condition may be: the number of visual words in the visual word dictionary of each divided picture set is less than or equal to the total number of visual words in the visual word dictionary of the server database; and counting the probability distribution of the visual words of each divided picture set, and calculating the entropy of the probability distribution of the visual words, wherein the information entropy of the probability distribution is less than a set threshold value.

And finally, the server side sends the established visual word dictionary to the client side and stores the visual word dictionary for subsequent use. When the server side has a new picture, the visual word dictionary of the server side can be updated, and the visual word dictionary of the client side can be updated at the same time.

In contrast to the prior art, the filtering rule of the valid visual word dictionary in the present embodiment may be (i.e., the filtering rule of the valid visual word dictionary used in the second step P2 may be):

step P41: selecting a certain number of pictures from a certain class of pictures as sample pictures, and converting the characteristics of the sample pictures into visual words in the original visual word dictionary;

step P42: inquiring in a visual word index table of the original visual word dictionary according to the visual words of the sample picture to obtain an original inquiry result;

step P43: combining any visual words belonging to an original visual word dictionary to form a screening visual word dictionary, converting the characteristics of the sample picture into first visual words corresponding to the screening visual word dictionary based on the screening visual word dictionary, and inquiring in a visual word index table of the original visual word dictionary by adopting the first visual words to obtain a first inquiry result corresponding to the screening visual word dictionary;

step P44: analyzing the original query results of all sample pictures and the first query result, and if the first query result is consistent with the original query result, adopting the current screening visual word dictionary as a visual word dictionary; otherwise, selecting a visual word from the original visual word dictionary, adding the visual word to the current screening visual word dictionary, and returning to the step of obtaining the first query result.

It should be noted that: the visual word dictionary generation mode corresponding to each type of picture set is that a visual word dictionary of the picture is established by adopting a clustering mode for the visual features of the picture set.

Compared with the prior art, the searching method in the embodiment only needs to transmit dozens of bits of coded data volume to the server, so that the purpose of fast query of the client is achieved, meanwhile, the transmission efficiency of the client in the process of querying the target picture is improved, and the response query time of the server is shortened.

Particularly, the image search method of the embodiment is mainly applied to image query in mobile terminals, and the mobile terminals select a suitable visual word dictionary for query information in a self-adaptive manner and obtain target visual words with visual description capability, so that the data volume of a target image to be queried is effectively reduced, data transmission with low bit between a client and a server is further realized, the waiting time of a user in querying the target image is shortened, the response time of the server is improved, and the query efficiency of the image search method is further improved.

Furthermore, the searching method can also improve the accuracy of the retrieval result. The method can be popularized and applied to retrieval/search of various pictures, and the expansion information of the result picture can be acquired, so that the method is wide in application range, can be used in various fields, and is convenient for a user to retrieve various information.

Referring to FIG. 2, FIG. 2 is a flow chart illustrating specific steps for screening a valid visual dictionary in the present invention; that is, the specific calculation step of screening the effective visual dictionary in the above-mentioned index construction method for distributed picture search includes:

in a first step 201: selecting N from the whole picture database_sampleAnd taking the sample pictures as query pictures to query in the visual word index table, and retrieving the previous R query picture results. For the ith picture, the query result isThe picture ranked at the j-th position in the query result,

the visual word vector of

Second step 202: calculating term frequency-inverse document frequency (TF-IDF) of each result picture,

TF-IDF of

A valid visual word dictionary is screened from a subset of the original visual word dictionary.

Third step 203: setting the iteration number as d as 1, and setting the effective visual word dictionary min _ V_jNull, candidate visual word set cadi _ V_jV (V is the original visual word dictionary) with N elements_cv，N_sampleWeight set of picture

w_iIf the weight of the picture i is 0, the test subset train _ V is empty;

the fourth step 204: if the number of iterations d > alpha or lost_RankIf < beta, the process is ended.

The fifth step 205: otherwise, N in the candidate visual word set is used_cvThe individual visual words are added to the test subset tran _ V, respectively, resulting in N_cvTest subsets train _ V₁，...，

train_V_t＝min_V∪{wd_t}。

Sixth step 206: using each test subset as a visual word dictionary, and respectively querying a local feature vector S of a picture i according to the visual word dictionary_iConverting into visual word vector, testing subset train _ V_kThe corresponding picture i visual word vector is

Seventh step 207: calculating the total error rate caused by describing each query picture by using each test subsetFor test subset train _ V_kAnd picture I_iTotal error rate Lost (I)_i)^kThe calculation method is as shown in the following M1-M4:

m1, will

Mapping into original visual word dictionary visual vectors

Is a mapping vector;

m2, calculating a test subset train _ V for the picture to be queried_kDescription, result picture

And query picture iContent similarity of

The calculation method comprises the following steps:

<math><mrow><msub><mrow><mo>|</mo><mo>|</mo><mover><mrow><mi>gBO</mi><msub><mi>W</mi><msub><mi>I</mi><mi>i</mi></msub></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow></mrow><mo>&RightArrow;</mo></mover><mo>·</mo><mover><msub><mi>BOW</mi><msubsup><mi>A</mi><mi>j</mi><mi>i</mi></msubsup></msub><mo>&RightArrow;</mo></mover><mo>|</mo><mo>|</mo></mrow><mrow><mi>Co</mi><mi>sin</mi><mi>e</mi></mrow></msub><mo>=</mo><mfrac><mrow><mover><msub><mi>BOW</mi><msubsup><mi>A</mi><mi>j</mi><mi>i</mi></msubsup></msub><mo>&RightArrow;</mo></mover><mo>·</mo><mover><mrow><msub><mi>gBOW</mi><msub><mi>I</mi><mi>i</mi></msub></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow></mrow><mo>&RightArrow;</mo></mover></mrow><mrow><mo>|</mo><mo>|</mo><mover><mrow><mi>BO</mi><msub><mi>W</mi><msubsup><mi>A</mi><mi>j</mi><mi>i</mi></msubsup></msub></mrow><mo>&RightArrow;</mo></mover><mo>|</mo><mo>|</mo><mo>·</mo><mo>|</mo><mo>|</mo><mover><mrow><mi>gBO</mi><msub><mi>W</mi><msub><mi>I</mi><mi>i</mi></msub></msub></mrow><mo>&RightArrow;</mo></mover><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>|</mo><mo>|</mo></mrow></mfrac><mo>;</mo></mrow></math>

m3, test subset train _ V for calculation_kDescribing the error rate Lost (I) caused by querying picture I_i)^k

<math><mrow><mi>Lost</mi><msup><mrow><mo>(</mo><msub><mi>I</mi><mi>i</mi></msub><mo>)</mo></mrow><mi>k</mi></msup><mo>=</mo><msubsup><mi>w</mi><mi>i</mi><mrow><mi>d</mi><mo>-</mo><mn>1</mn></mrow></msubsup><mo>×</mo><munderover><mi>Σ</mi><mrow><mi>r</mi><mo>=</mo><mn>1</mn></mrow><mi>R</mi></munderover><mi>R</mi><mrow><mo>(</mo><msubsup><mi>A</mi><mi>r</mi><mi>i</mi></msubsup><mo>)</mo></mrow><mo>·</mo><msub><mi>TI</mi><msub><mi>A</mi><mi>r</mi></msub></msub><mo>·</mo><msub><mrow><mo>|</mo><mo>|</mo><mover><mrow><mi>gBO</mi><msub><mi>W</mi><msub><mi>I</mi><mi>i</mi></msub></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow></mrow><mo>&RightArrow;</mo></mover><mo>·</mo><mover><msub><mi>BOW</mi><msubsup><mi>A</mi><mi>j</mi><mi>i</mi></msubsup></msub><mo>&RightArrow;</mo></mover><mo>|</mo><mo>|</mo></mrow><mrow><mi>Co</mi><mi>sin</mi><mi>e</mi></mrow></msub><mo>;</mo></mrow></math>

Is a picture of the result

The function of the ascending sort position can be set

M4, test subset train _ V for calculation_kDescribing total error rate of query pictures

<math><mrow><msubsup><mi>lost</mi><mi>Rank</mi><mi>k</mi></msubsup><mo>=</mo><munderover><mi>Σ</mi><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><msub><mi>N</mi><mi>sample</mi></msub></munderover><mi>Lost</mi><msup><mrow><mo>(</mo><msub><mi>I</mi><mi>i</mi></msub><mo>)</mo></mrow><mrow><mi>d</mi><mo>-</mo><mn>1</mn></mrow></msup><mo>.</mo></mrow></math>

Eighth step 208: choosing to make the total error rate lost_RankUpdating an effective visual word dictionary and a candidate visual word set by the minimum test subset, wherein the specific method comprises the following steps: if the test subset is train _ V_MINIf the dictionary is min _ V ═ train _ V, then the dictionary of valid visual words is min _ V ═ train _ V_MIN，cadi_V＝cadi_V-{wd_MIN}。

Ninth step 209: updating the weight of each query picture, wherein the calculation method for updating the weight of the query picture i comprises the following steps:

the tenth step 210: the iteration number d ═ d +1 is updated, and the process returns to the fourth step 204.

Based on the above embodiment, the following description will be made in detail by taking an example that the query information only includes pictures, and the query steps are as follows:

firstly, a client acquires a target picture to be searched.

And secondly, the client acquires more than one characteristic of the target picture and converts the characteristics into visual words.

Specifically, in this embodiment, visual features such as a color histogram, a texture map, a scale invariant descriptor, a gradient position orientation histogram, or a directional gradient histogram of the target picture may be extracted.

Then, according to the mapping rule of the visual features and the visual words, converting the color histogram, the texture map, the scale invariant descriptor (SIFT), the gradient position orientation histogram (GLOH) or the direction gradient Histogram (HOG) of the target picture into the visual words in the visual word dictionary of the client.

And thirdly, searching a target visual word dictionary matching the target picture from one or more visual word dictionary libraries of the client. The visual word dictionary libraries of the clients are obtained by downloading the visual word dictionary libraries of the clients from the server in advance. That is, the client is previously provided with a visual word dictionary library corresponding to the server.

Particularly, when the inquired content is only a target picture, the client selects a visual word dictionary library with visual similarity established according to picture similarity, calculates the visual similarity distance of a picture class where any visual word dictionary in the visual similarity visual word dictionary library where the target picture and the visual word dictionary library are located, and selects the visual word dictionary with the minimum similarity distance as the visual word dictionary matched with the target picture, namely the target visual word dictionary. The visual similarity distance is the cosine distance between the visual words of the target picture and the class center of the picture class where the visual word dictionary is located.

Fourthly, analyzing the visual words and the target visual word dictionary to obtain target visual words corresponding to the target pictures; specifically, according to the visual word dictionary, visual words of a target picture are screened, and the visual words belonging to the visual word dictionary are selected as the target visual words;

fifthly, compressing the target visual words into data packets according to a Huffman coding method; the method is specifically characterized in that the probability of each target visual word is scanned, a Huffman tree is established, the target words are coded by '0' and '1', the larger the probability is, the fewer the coding bits are, and the visual words and the corresponding codes are stored in a Huffman coding table and sent to a client.

And sixthly, the server decodes the data packet into a target visual word according to the Huffman coding table, searches a visual word index table of an original visual word dictionary in the server according to the target visual word to obtain more than one result picture corresponding to the target visual word and/or obtain the expansion information of the result picture, and sends the result picture and/or the expansion information to the client for display.

According to another aspect of the present invention, the present invention further provides a picture searching method, as shown in fig. 3, the steps of which include:

step 301: and the server receives the encoded target visual words and decodes the target visual words.

Step 302: the server searches an index table corresponding to a visual word dictionary in the server based on the target visual word to obtain the result picture and/or the related information of the result picture.

The visual word dictionary is: and the visual word dictionary is established by adopting a clustering mode on the visual features of all or part of pictures in the server side picture database.

Step 303: and sending the result picture and/or the related information of the result picture to the client for displaying.

In the embodiment, fewer target visual word query result pictures are adopted, so that the efficiency of target picture query is improved, the waiting time of a user is shortened on the basis of realizing the original retrieval performance, and the purpose of picture query under the condition of less bandwidth is further realized.

According to another aspect of the present invention, the present invention further provides an image search system, as shown in fig. 4, including:

a receiving module 401, in which a client receives a target picture to be queried, or query contents including the target picture to be queried and related information;

a target visual word obtaining module 402, wherein the client obtains a visual word of a target picture, selects at least one target visual word dictionary corresponding to the query content from more than one visual word dictionary of the client according to a preset rule, and obtains the target visual word of the visual word according to the target visual word dictionary;

a target visual word sending module 403, which codes the target visual word and sends it to the server,

a receiving and searching module 404, in which the server receives and decodes the encoded target visual words, and searches the index table corresponding to the visual word dictionary of all pictures in the database based on the target visual words to obtain the result picture and/or the related information of the result picture;

and a sending module 405, where the server sends the result picture and/or the related information of the result picture to the client.

The image query system or the image search system automatically selects the visual word dictionary suitable for the query information type according to the combination type of the query information, converts the image into the visual word according to the visual word dictionary, further compresses the visual word into a data packet of the target visual word with less data volume, and then quickly and accurately acquires the result image of the target image to be retrieved and the related expansion information thereof according to the data packet.

The picture searching system effectively divides the database picture set according to the picture division criteria in the process of obtaining the target picture, so that the types of the visual words of various divided pictures are far smaller than the visual types of the original database picture set, the number of the visual words describing the pictures is effectively reduced, the target picture is converted into a data packet of the target visual words with dozens of bits, the data amount transmitted to the service end is reduced, the low-bit transmission between the client and the service end is achieved, the problem of long data transmission time under the current bandwidth limitation can be effectively solved, and the waiting time of a user can be better saved. The searching method is suitable for different types of queries and has strong expandability.

The client mentioned in this embodiment may be a mobile terminal, such as a mobile phone, an IPAD, a tablet computer, and the like.

Specifically, the client in this embodiment may include:

the receiving module is used for receiving query contents comprising a target picture to be queried or the target picture to be queried and related information;

the target visual word acquisition module is used for acquiring visual words of an internal target picture, selecting at least one target visual word dictionary corresponding to the query content from more than one visual word dictionary of the client according to a preset rule, and acquiring the target visual words of the visual words according to the target visual word dictionary;

and the result picture receiving module is used for receiving and displaying the result picture and/or the related information of the result picture which is searched and sent by the server.

Each module displayed in the image search system is only for schematically displaying the internal structural relationship thereof, and it is possible to use the same module for transmission or reception many times in a certain system, client or other structure, or use the certain module at intervals, and the above embodiment is only for schematically illustrating, and is not limited to the structural arrangement relationship and the connection relationship in fig. 4. In addition, it may also appear that some other modules capable of implementing some steps in the picture searching method in the present invention are added in the picture searching system and the client.

Finally, it should be noted that: the order of each step in the above image searching method may be performed in parallel or in an alternative manner, and the above embodiment is only an illustrative example, and does not limit the execution order of the steps. In addition, the above embodiments are only used to illustrate the technical solution of the present invention and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An image searching method, comprising:

2. The picture searching method according to claim 1, wherein:

selecting at least one target visual word dictionary corresponding to the query content from more than one visual word dictionary of the client according to preset rules, wherein the steps comprise:

according to the type of the query content, searching a visual word dictionary library and a prediction loss function which are matched with the type of the query content from one or more visual word dictionary libraries preset in advance by a client; and

and calculating the prediction loss value of each visual word dictionary in the visual word dictionary library to the visual word of the target picture by adopting a prediction loss function, and acquiring one or more visual word dictionaries in a threshold range.

3. The picture searching method according to claim 2, wherein the type of the query content includes:

the method comprises the steps that target pictures, target pictures and texts, target pictures and signals detected by a sensor are detected, and target picture and object identification software identifies object labels in the pictures;

the signal detected by the sensor comprises geographic position information detected by a global positioning system device, a bar code scanner is used for scanning a bar code of a book or a commodity, and an electronic tag reader is used for reading an electronic tag;

the object label identified by the object identification software comprises a human face identified by the human face identification software, and characters identified by the character identification system software.

4. The picture searching method according to claim 2, wherein:

one or more visual word dictionary libraries preset in advance by a client are obtained by the client from a server in advance, and the client updates the one or more visual word dictionary libraries at regular time;

the step that the server side establishes one or more visual word dictionary libraries comprises the following steps:

dividing the pictures in the server database into picture sets of various types by adopting a picture set dividing mode, establishing a visual word dictionary corresponding to each picture set, analyzing the visual word dictionary corresponding to each picture, and forming a visual word dictionary library by the set of the visual word dictionaries corresponding to the picture sets of various types if the visual word dictionary meets the establishment condition of the visual word dictionary library;

wherein: the visual word dictionary base is established under the conditions that:

the number of visual words in the visual word dictionary of each divided picture set is less than or equal to the total number of visual words in the visual word dictionary of the server database;

and counting the probability distribution of the visual words of the picture set, and calculating the entropy of the probability distribution of the visual words, wherein the information entropy of the probability distribution is less than a set threshold value.

5. The picture searching method according to claim 4, wherein:

the visual word dictionary is: establishing an original visual word dictionary of the picture by clustering visual features of the picture set; or,

and establishing a visual word dictionary of the picture by adopting a clustering mode for the visual features of the picture set, determining an effective visual word dictionary representing the original visual word dictionary based on the screening rule of the effective visual word dictionary, and taking the effective visual word dictionary as the visual word dictionary.

6. The picture searching method according to claim 4, wherein:

the method for dividing the pictures in the server database into the picture sets of various types by adopting a picture set dividing mode comprises the following steps:

dividing all pictures into a plurality of picture sets by using visual similarity among the pictures; or,

dividing all pictures into a plurality of picture sets by using picture related information such as the photographing date of the pictures, text labels, electronic labels and the like; or

All pictures are divided into a plurality of sets by using visual similarity among the pictures and photographing dates, text labels, electronic labels and the like of information related to the pictures.

7. The picture searching method according to claim 2, wherein:

in the step of calculating the prediction loss value of each visual word dictionary in the visual word dictionary base on the visual word of the target picture by adopting the prediction loss function, the calculation mode of the prediction loss value is as follows:

the cosine distance between the visual words of the target picture and the class center of the picture class where the target visual word dictionary is located; or

The cosine distance between the visual words of the target picture and the class center of the picture class in which the target visual word dictionary is positioned, and the weighted sum of the Euclidean distance between the related information and the similar information of the picture class in which the visual word dictionary is positioned; or

The visual similarity distance of the target picture and the picture class where the visual word dictionary of the target visual word dictionary is located, and the product of the Euclidean distance of the related information and the same kind information of the picture class where the visual word dictionary is located.

8. The picture searching method according to claim 5,

the step of determining a visual word dictionary representative of the original visual word dictionary based on the screening rules of the valid visual word dictionary comprises:

selecting a certain number of pictures from a certain class of pictures as sample pictures, and converting the characteristics of the sample pictures into visual words in the original visual word dictionary;

inquiring in a visual word index table of the original visual word dictionary according to the visual words of the sample picture to obtain an original inquiry result;

combining any visual words belonging to an original visual word dictionary to form a screening visual word dictionary, converting the characteristics of the sample picture into first visual words corresponding to the screening visual word dictionary based on the screening visual word dictionary, and inquiring in a visual word index table of the original visual word dictionary by adopting the first visual words to obtain a first inquiry result corresponding to the screening visual word dictionary;

analyzing the original query results of all sample pictures and the first query result, and if the first query result is consistent with the original query result, adopting the current screening visual word dictionary as a visual word dictionary; otherwise, selecting a visual word from the original visual word dictionary, adding the visual word to the current screening visual word dictionary, and returning to the step of obtaining the first query result.

9. An image searching method, comprising:

10. An image search system, comprising:

and the server side sends the result picture and/or the related information of the result picture to the client side for displaying.