CN102147815B - Method and system for searching images - Google Patents

Method and system for searching images Download PDF

Info

Publication number
CN102147815B
CN102147815B CN 201110100485 CN201110100485A CN102147815B CN 102147815 B CN102147815 B CN 102147815B CN 201110100485 CN201110100485 CN 201110100485 CN 201110100485 A CN201110100485 A CN 201110100485A CN 102147815 B CN102147815 B CN 102147815B
Authority
CN
China
Prior art keywords
picture
visual
word dictionary
visual word
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 201110100485
Other languages
Chinese (zh)
Other versions
CN102147815A (en
Inventor
段凌宇
纪荣嵘
陈杰
李冰
黄铁军
姚鸿勋
高文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN 201110100485 priority Critical patent/CN102147815B/en
Publication of CN102147815A publication Critical patent/CN102147815A/en
Application granted granted Critical
Publication of CN102147815B publication Critical patent/CN102147815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Character Discrimination (AREA)

Abstract

The invention provides a method and system for searching images. The method comprises: a client receives a content to be searched which comprises a target image to be searched or the target image to be searched and relevant information thereof; the client obtains visual words of the target image and selects at least one target visual word dictionary corresponding to the content to be searched from more than one visual word dictionary according to preset rules and obtains the target visual word of the visual words according to the target visual word dictionary; and the target visual word is encoded and then transmitted to a server to obtain upshots matched with the searched content and/or relevant information of the upshots. The method provided by the invention can be used for improving the image search speed by reducing data uploaded by the client while shortening waiting time of users and improving the search accuracy of the search system.

Description

Picture searching method and picture searching system
Technical Field
The invention relates to the technical field of picture identification and search, in particular to a picture search method and a picture search system.
Background
With the rapid development of wireless networks and the continuous enhancement of functions of mobile devices, users frequently inquire picture information by using the mobile devices. The earliest appeared is to use text to describe the contents of pictures, and then follow-up retrieval/search is performed according to the text contents. However, the text cannot accurately describe the content of the picture, and the search result of the text search picture is often not the information required by the user, so that the text search mode cannot be satisfied by the user.
Another content-based image searching method is a searching method aiming at searching similar images by using images as queries, and can avoid the problem of inaccurate text description brought by text searching images. However, the content-based picture search method directly transmits an image to a server, thereby generating a large data transmission amount. In particular, in a wireless network environment with limited and unstable bandwidth, a picture search often requires a long query response time.
Therefore, the industry describes the picture through the visual descriptor, converts the picture into a one-dimensional vector consisting of a plurality of data, and changes the transmission of the picture to the server into the transmission of the data vector to the server. The description mode of the visual descriptor for the picture can improve the query response time of the picture, but is limited by the quality of the current mobile network, and the uploading speed still cannot meet the actual requirements of users. In view of this, how to provide a picture retrieval method that can ensure the picture retrieval performance and efficiency and reduce the requirement for bandwidth in picture retrieval is a technical problem that needs to be solved currently.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a picture searching method and a picture searching system, which improve the picture retrieval speed, shorten the waiting time of a user and improve the searching accuracy of the searching system by reducing the data volume uploaded by a client under the condition of not reducing the searching performance.
The picture searching method provided by the invention comprises the following steps:
the client receives query content, wherein the query content comprises a target picture to be queried or the target picture to be queried and related information;
the client acquires visual words of a target picture, selects at least one target visual word dictionary corresponding to the query content from more than one visual word dictionary of the client according to a preset rule, and acquires the target visual words of the visual words according to the target visual word dictionary;
and coding the target visual words and then sending the coded target visual words to a server so as to obtain result pictures matched with the query contents and/or relevant information of the result pictures.
According to another aspect of the present invention, the present invention also provides an image searching method, which includes:
the server receives the encoded target visual words and decodes the target visual words;
the server searches an index table corresponding to a visual word dictionary in the server on the basis of the target visual word to obtain a result picture and/or related information of the result picture, and sends the result picture and/or related information of the result picture to the client;
the visual word dictionary is: and the visual word dictionary is established by adopting a clustering mode for the visual features of all the pictures in the server side picture database.
According to another aspect of the present invention, the present invention also provides an image search system, which includes:
the client receives query contents comprising a target picture to be queried or the target picture to be queried and related information;
the target visual word acquisition module is used for acquiring the visual words of the target picture by the client, selecting at least one target visual word dictionary corresponding to the query content from more than one visual word dictionary of the client according to a preset rule, and acquiring the target visual words of the visual words according to the target visual word dictionary;
the target visual word sending module is used for coding the target visual word and sending the coded target visual word to the server,
the receiving and searching module is used for receiving and decoding the coded target visual words by the server side, and searching the index table corresponding to the visual word dictionary of all pictures in the database based on the target visual words to obtain the result pictures and/or the related information of the result pictures;
and the server side sends the result picture and/or the related information of the result picture to the client side.
The picture searching method and the picture searching system provided by the invention mainly compress the target picture into the target visual word with the visual content description capacity at the client and transmit the target visual word to the server, so that low-bit data transmission between the client and the server is realized, the waiting time of a user in inquiring the target picture is shortened, the response time of the server in the system is improved, and the inquiring efficiency in the picture searching method is further improved.
Furthermore, the searching method can also improve the accuracy of the searching result. The method can be popularized and applied to retrieval/search of various pictures, and can acquire the extension information of the result picture, so that the method is wide in application range, applicable to various fields and convenient for a user to retrieve various information.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating steps of an embodiment of a method for searching pictures according to the present invention;
FIG. 2 is a flow chart of the steps for screening a valid visual dictionary in the present invention;
FIG. 3 is a flowchart illustrating steps of an embodiment of a method for searching pictures according to the present invention;
fig. 4 is a schematic structural diagram of an embodiment of the image search system in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention mainly provides a picture searching method, which mainly utilizes a visual word dictionary library preset in advance by a client to obtain target visual words aiming at a target picture with less transmission data volume, and then codes the target visual words and sends the coded target visual words to a server, and a result picture and/or related extended information are obtained from the server. The method effectively reduces the number of visual words describing the target picture, reduces the data volume transmitted to the server, achieves low-bit data transmission between the client and the server, can effectively solve the problem of long data transmission time under the current bandwidth limitation, can reduce the response time of the server, and further can better save the waiting time of a user.
Of note in the following description is:
visual words: the visual characteristic space is divided discretely, each word is divided, and the picture content is described by using the picture characteristics, which are the most basic data characteristics;
visual word dictionary: a set of visual words of all or selected parts of the pictures in the picture database.
Referring to fig. 1, fig. 1 is a flowchart illustrating steps of an embodiment of a picture searching method according to the present invention, where the steps include:
step 101, a client receives query content, wherein the query content comprises a target picture/query picture to be queried or the target picture and related information to be queried; the relevant information at this point is information other than the target picture. For example, the related information may be text information describing the target picture, geographical location information, publisher barcode, publisher logo or electronic tag, and the like.
102, a client acquires visual words of a target picture, selects at least one target visual word dictionary corresponding to query contents from more than one visual word dictionary of the client according to a preset rule, and acquires the target visual words of the visual words according to the target visual word dictionary;
the generation mode of the visual words of the target picture can be that more than one visual feature of the target picture is obtained, and the features are converted into the visual words in the original visual word dictionary according to the mapping rules of the visual features and the visual words. Preferably, the original visual word dictionary of the client is the same as that of the server, and the original visual word dictionary of the client can be preset in the client in advance and can be updated from the server in real time.
The generation mode of the original visual word dictionary can be that more than one visual characteristic of the database picture of the server side is obtained, and a plurality of classes are generated by adopting a clustering mode for the visual characteristics of the database picture. The specific way of this and subsequent clustering methods may be K-means clustering, hierarchical clustering, spectral clustering, etc., where the spectral clustering may be described in references "Ng a., Jordan m., and Weiss y.on spectral clustering: NIPS, 849-. The class center of each class represents the class and is called visual word, namely each class is a visual word, and the visual word set of the whole database forms an original visual word dictionary.
Specifically, in this embodiment, the visual features such as a color histogram, a texture map, a scale invariant descriptor, a gradient position orientation histogram, or a directional gradient histogram of the target picture may be extracted;
then, according to the mapping rule of the visual features and the visual words, converting the color histogram, the texture map, the scale invariant descriptor (SIFT), the gradient position orientation histogram (GLOH) or the directional gradient Histogram (HOG) of the target picture into all the visual words corresponding to the original visual word dictionary of the service end.
And a substep 1021, searching the visual word dictionary library and the prediction loss function matching the type of the query content from one or more visual word dictionary libraries preset in advance by the client according to the type of the query content. That is, the query content type and visual word dictionary mapping rules are preset. For example, the query content is a picture and text information describing the picture, and the visual word dictionary library is a visual word dictionary library corresponding to the text information.
Specifically, one or more visual word dictionary libraries preset by the client are obtained by the client from the server in advance, and the client updates the one or more visual word dictionary libraries regularly. Or when new pictures are added at the server, the client can be prompted to update the visual word dictionary library inside the server. The following steps P1 to P3 are used to describe in detail how the server obtains the visual word dictionary database.
And a substep 1022 of calculating a prediction loss value of each visual word dictionary in the visual word dictionary database for the visual word of the target picture by using a prediction loss function, and acquiring one or more visual word dictionaries within a threshold range.
The prediction loss function is adopted to calculate the prediction loss value of each visual word dictionary in the visual word dictionary base on the visual word of the target picture, and the specific calculation mode of the prediction loss value can be selected from any one of the following first calculation mode to the third calculation mode.
The first calculation method: the cosine distance between the visual words of the target picture and the class center of the picture class where the target visual word dictionary is located; or
The second calculation method: the cosine distance between the visual words of the target picture and the class center of the picture class in which the target visual word dictionary is positioned, and the weighted sum of the Euclidean distance between the related information and the similar information of the picture class in which the visual word dictionary is positioned;
the third calculation method: the visual similarity distance of the target picture and the picture class where the visual word dictionary of the target visual word dictionary is located, and the product of the Euclidean distance of the related information and the same kind information of the picture class where the visual word dictionary is located.
For example, the prediction loss function fPrediction(qi,Cj) The formula of (1) is:
fprediction(qi,Cj)=α·Vdij+β·Rdij
fPrediction(qi,Cj) Representing a target picture qiClass C of pictures with visual word dictionaryjPredicted loss value of, VdijIs the cosine distance, Rd, of the class center of the picture class in which the visual word of the target picture and the target visual word dictionary are locatedijThe Euclidean distance of the related information and the same kind of information of the picture class where the visual word dictionary is located. And alpha and beta are real numbers and can be set according to experience or requirements.
Cosine distance Vd between visual words of target picture and class center of picture class in which target visual word dictionary is locatedijThe calculation formula is as follows,
Vd ij = | | BOW i → , BOW j → | | Co sin e = BOW i → · BOW j → | | BOW i → | | · | | BOW j → | | ;
Figure BDA0000056572300000071
picture i is the visual word of the target picture,
Figure BDA0000056572300000072
picture class C of target visual word dictionaryjClass center of (1).
Euclidean distance Rd of related information and similar information of picture class in which visual word dictionary is locatedijIs calculated by the formula
Rd ij = | | R i , R j | | Co sin e = ( R i - R j ) 2
RiFor picture i as related information in the query content, RjPicture class C of target visual word dictionaryjThe same kind of information value.
In addition, the type of the content queried in the above sub-step 1021 may include: the method comprises the steps of detecting a target picture, a target picture and text, a target picture and a signal detected by a sensor, and identifying an object label in the picture by target picture and object identification software. The signal detected by the sensor may include geographical location information detected by a Global Positioning System (GPS) device, barcode information of a book or a commodity scanned by a barcode scanner, electronic tag information (RFID) read by an electronic tag reader, and the like. The object tags recognized by the object recognition software may include recognizing human faces with face recognition software, recognizing text with a text recognition system software (ORC), and the like.
For example, when the type of the query content is the target picture class, the visual word dictionary library is a visual word dictionary library of visual similarity established according to picture similarity.
When the type of the query content is the target picture and the signal class detected by the sensor, if the query content is the landmark picture, and the signal detected by the sensor may be a building in the landmark picture, geographical location information corresponding to the building, or geographical location information corresponding to a natural landscape in the landmark picture. In this case, the visual word dictionary library is a visual word dictionary library corresponding to the geographical location information.
When the type of the query content is the target picture and the object identification software identifies the object tag class in the picture, if the query content is a book picture, the object identification software identifies that the object tag in the picture can be a publisher logo or a name of the book in the book picture. In this case, the visual word dictionary library is a visual word dictionary library corresponding to a publisher logo or name.
The inquiry content is a picture of a commodity, the object recognition software recognizes that an object label in the picture can be a trademark of the commodity, or a bar code scanner scans a bar code of the corresponding commodity (real object) in the picture, and the visual word dictionary library is a visual word dictionary library corresponding to the trademark or the bar code.
The query content is a guide indication picture of a museum exhibition room, the object identification software identifies that the object label in the picture is a bar code or an electronic label in the guide indication picture, and the visual word dictionary library is a visual word dictionary library corresponding to the bar code or the electronic label. In the step, the picture set is divided into a plurality of classes, so that the coupled visual words of the divided picture set are maximum, and the purpose of reducing the dimensionality of the visual word dictionary is achieved.
And 103, coding the target visual words and then sending the coded target visual words to a server so as to obtain and display a result picture matched with the query content and/or related information of the result picture.
In the foregoing sub-step 1021, when one or more visual word dictionary libraries preset by the client are obtained from the server in advance for the client, the step of the server establishing one or more visual word dictionary libraries in advance includes:
first step P1: and dividing the pictures in the server database into picture sets of various types by adopting a picture set dividing mode.
The sub-step of the first step P1 is to divide all pictures into multiple picture sets by using visual similarity between pictures. Alternatively, the sub-step of the first step P1 is to divide all pictures into a plurality of picture sets using picture related information such as the date of picture taking, text labels, electronic labels, etc. Of course, the sub-step of the first step P1 may also be the division of all pictures into sets using visual similarities between the pictures and the date of the picture taking, text labels, electronic labels, etc. of the information related to the pictures.
Second step P2: and establishing a visual word dictionary corresponding to each picture set, and analyzing the visual word dictionary corresponding to each picture. In particular. The visual word dictionary can be an original visual word dictionary of the picture established by the visual characteristics of the picture set in a clustering way; or, the visual word dictionary here is: the visual word dictionary of the picture is established by adopting a clustering mode for the visual features of the picture set, the effective visual word dictionary representing the original visual word dictionary is determined based on the screening rule of the effective visual word dictionary, the effective visual word dictionary is used as the visual word dictionary, and the dimension (the dimension in the N-axis coordinate system) of the visual word dictionary is further relatively reduced.
Third step P3: (first means for obtaining visual word dictionary library) if the visual word dictionary satisfies the visual word dictionary library establishment condition, the set of visual word dictionaries corresponding to each type of picture set forms a visual word dictionary library.
Wherein: the visual word dictionary base establishment condition may be: the number of visual words in the visual word dictionary of each divided picture set is less than or equal to the total number of visual words in the visual word dictionary of the server database; and counting the probability distribution of the visual words of each divided picture set, and calculating the entropy of the probability distribution of the visual words, wherein the information entropy of the probability distribution is less than a set threshold value.
And finally, the server side sends the established visual word dictionary to the client side and stores the visual word dictionary for subsequent use. When the server side has a new picture, the visual word dictionary of the server side can be updated, and the visual word dictionary of the client side can be updated at the same time.
In contrast to the prior art, the filtering rule of the valid visual word dictionary in the present embodiment may be (i.e., the filtering rule of the valid visual word dictionary used in the second step P2 may be):
step P41: selecting a certain number of pictures from a certain class of pictures as sample pictures, and converting the characteristics of the sample pictures into visual words in the original visual word dictionary;
step P42: inquiring in a visual word index table of the original visual word dictionary according to the visual words of the sample picture to obtain an original inquiry result;
step P43: combining any visual words belonging to an original visual word dictionary to form a screening visual word dictionary, converting the characteristics of the sample picture into first visual words corresponding to the screening visual word dictionary based on the screening visual word dictionary, and inquiring in a visual word index table of the original visual word dictionary by adopting the first visual words to obtain a first inquiry result corresponding to the screening visual word dictionary;
step P44: analyzing the original query results of all sample pictures and the first query result, and if the first query result is consistent with the original query result, adopting the current screening visual word dictionary as a visual word dictionary; otherwise, selecting a visual word from the original visual word dictionary, adding the visual word to the current screening visual word dictionary, and returning to the step of obtaining the first query result.
It should be noted that: the visual word dictionary generation mode corresponding to each type of picture set is that a visual word dictionary of the picture is established by adopting a clustering mode for the visual features of the picture set.
Compared with the prior art, the searching method in the embodiment only needs to transmit dozens of bits of coded data volume to the server, so that the purpose of fast query of the client is achieved, meanwhile, the transmission efficiency of the client in the process of querying the target picture is improved, and the response query time of the server is shortened.
Particularly, the image search method of the embodiment is mainly applied to image query in mobile terminals, and the mobile terminals select a suitable visual word dictionary for query information in a self-adaptive manner and obtain target visual words with visual description capability, so that the data volume of a target image to be queried is effectively reduced, data transmission with low bit between a client and a server is further realized, the waiting time of a user in querying the target image is shortened, the response time of the server is improved, and the query efficiency of the image search method is further improved.
Furthermore, the searching method can also improve the accuracy of the retrieval result. The method can be popularized and applied to retrieval/search of various pictures, and the expansion information of the result picture can be acquired, so that the method is wide in application range, can be used in various fields, and is convenient for a user to retrieve various information.
Referring to FIG. 2, FIG. 2 is a flow chart illustrating specific steps for screening a valid visual dictionary in the present invention; that is, the specific calculation step of screening the effective visual dictionary in the above-mentioned index construction method for distributed picture search includes:
in a first step 201: selecting N from the whole picture databasesampleAnd taking the sample pictures as query pictures to query in the visual word index table, and retrieving the previous R query picture results. For the ith picture, itQuery results
Figure BDA0000056572300000101
The picture ranked at the j-th position in the query result,
Figure BDA0000056572300000102
the visual word vector of
Figure BDA0000056572300000103
Second step 202: calculating term frequency-inverse document frequency (TF-IDF) of each result picture,TF-IDF of
Figure BDA0000056572300000105
A valid visual word dictionary is screened from a subset of the original visual word dictionary.
Third step 203: setting the iteration number as d as 1, and setting the effective visual word dictionary min _ VjNull, candidate visual word set cadi _ VjV (V is the original visual word dictionary) with N elementscv,NsampleWeight set of picture
Figure BDA0000056572300000111
wiIf the weight of the picture i is 0, the test subset train _ V is empty;
the fourth step 204: if the number of iterations d > alpha or lostRankIf < beta, the process is ended.
The fifth step 205: otherwise, N in the candidate visual word set is usedcvThe individual visual words are added to the test subset tran _ V, respectively, resulting in NcvTest subsets train _ V1,...,
Figure BDA0000056572300000112
train_Vt=min_V∪{wdt}。
Sixth step 206: using each test subset as a visual word dictionary, and respectively querying a local feature vector S of a picture i according to the visual word dictionaryiConverting into visual word vector, testing subset train _ VkThe corresponding picture i visual word vector is
Figure BDA0000056572300000113
Seventh step 207: calculating the total error rate caused by describing each query picture by using each test subset
Figure BDA0000056572300000114
For test subset train _ VkAnd picture IiTotal error rate Lost (I)i)kThe calculation method is as shown in the following M1-M4:
m1, will
Figure BDA0000056572300000115
Mapping into original visual word dictionary visual vectors
Figure BDA0000056572300000116
Is a mapping vector;
m2, calculating a test subset train _ V for the picture to be queriedkDescription, result picture
Figure BDA0000056572300000118
Content similarity with query picture i
Figure BDA0000056572300000119
The calculation method comprises the following steps:
| | gBO W I i ( k ) &RightArrow; &CenterDot; BOW A j i &RightArrow; | | Co sin e = BOW A j i &RightArrow; &CenterDot; gBOW I i ( k ) &RightArrow; | | BO W A j i &RightArrow; | | &CenterDot; | | gBO W I i &RightArrow; ( k ) | | ;
m3, test subset train _ V for calculationkDescribing the error rate Lost (I) caused by querying picture Ii)k
Lost ( I i ) k = w i d - 1 &times; &Sigma; r = 1 R R ( A r i ) &CenterDot; TI A r &CenterDot; | | gBO W I i ( k ) &RightArrow; &CenterDot; BOW A j i &RightArrow; | | Co sin e ;
Figure BDA00000565723000001112
Is a picture of the resultThe function of the ascending sort position can be set
Figure BDA00000565723000001114
M4, test subset train _ V for calculationkDescribing total error rate of query pictures
Figure BDA0000056572300000121
lost Rank k = &Sigma; i = 1 N sample Lost ( I i ) d - 1 .
Eighth step 208: choosing to make the total error rate lostRankUpdating an effective visual word dictionary and a candidate visual word set by the minimum test subset, wherein the specific method comprises the following steps: if the test subset is train _ VMINIf the dictionary is min _ V ═ train _ V, then the dictionary of valid visual words is min _ V ═ train _ VMIN,cadi_V=cadi_V-{wdMIN}。
Ninth step 209: updating the weight of each query picture, wherein the calculation method for updating the weight of the query picture i comprises the following steps:
the tenth step 210: the iteration number d ═ d +1 is updated, and the process returns to the fourth step 204.
Based on the above embodiment, the following description will be made in detail by taking an example that the query information only includes pictures, and the query steps are as follows:
firstly, a client acquires a target picture to be searched.
And secondly, the client acquires more than one characteristic of the target picture and converts the characteristics into visual words.
Specifically, in this embodiment, visual features such as a color histogram, a texture map, a scale invariant descriptor, a gradient position orientation histogram, or a directional gradient histogram of the target picture may be extracted.
Then, according to the mapping rule of the visual features and the visual words, converting the color histogram, the texture map, the scale invariant descriptor (SIFT), the gradient position orientation histogram (GLOH) or the direction gradient Histogram (HOG) of the target picture into the visual words in the visual word dictionary of the client.
And thirdly, searching a target visual word dictionary matching the target picture from one or more visual word dictionary libraries of the client. The visual word dictionary libraries of the clients are obtained by downloading the visual word dictionary libraries of the clients from the server in advance. That is, the client is previously provided with a visual word dictionary library corresponding to the server.
Particularly, when the inquired content is only a target picture, the client selects a visual word dictionary library with visual similarity established according to picture similarity, calculates the visual similarity distance of a picture class where any visual word dictionary in the visual similarity visual word dictionary library where the target picture and the visual word dictionary library are located, and selects the visual word dictionary with the minimum similarity distance as the visual word dictionary matched with the target picture, namely the target visual word dictionary. The visual similarity distance is the cosine distance between the visual words of the target picture and the class center of the picture class where the visual word dictionary is located.
Fourthly, analyzing the visual words and the target visual word dictionary to obtain target visual words corresponding to the target pictures; specifically, according to the visual word dictionary, visual words of a target picture are screened, and the visual words belonging to the visual word dictionary are selected as the target visual words;
fifthly, compressing the target visual words into data packets according to a Huffman coding method; the method is specifically characterized in that the probability of each target visual word is scanned, a Huffman tree is established, the target words are coded by '0' and '1', the larger the probability is, the fewer the coding bits are, and the visual words and the corresponding codes are stored in a Huffman coding table and sent to a client.
And sixthly, the server decodes the data packet into a target visual word according to the Huffman coding table, searches a visual word index table of an original visual word dictionary in the server according to the target visual word to obtain more than one result picture corresponding to the target visual word and/or obtain the expansion information of the result picture, and sends the result picture and/or the expansion information to the client for display.
According to another aspect of the present invention, the present invention further provides a picture searching method, as shown in fig. 3, the steps of which include:
step 301: and the server receives the encoded target visual words and decodes the target visual words.
Step 302: the server searches an index table corresponding to a visual word dictionary in the server based on the target visual word to obtain the result picture and/or the related information of the result picture.
The visual word dictionary is: and the visual word dictionary is established by adopting a clustering mode on the visual features of all or part of pictures in the server side picture database.
Step 303: and sending the result picture and/or the related information of the result picture to the client for displaying.
In the embodiment, fewer target visual word query result pictures are adopted, so that the efficiency of target picture query is improved, the waiting time of a user is shortened on the basis of realizing the original retrieval performance, and the purpose of picture query under the condition of less bandwidth is further realized.
According to another aspect of the present invention, the present invention further provides an image search system, as shown in fig. 4, including:
a receiving module 401, in which a client receives a target picture to be queried, or query contents including the target picture to be queried and related information;
a target visual word obtaining module 402, wherein the client obtains a visual word of a target picture, selects at least one target visual word dictionary corresponding to the query content from more than one visual word dictionary of the client according to a preset rule, and obtains the target visual word of the visual word according to the target visual word dictionary;
a target visual word sending module 403, which codes the target visual word and sends it to the server,
a receiving and searching module 404, in which the server receives and decodes the encoded target visual words, and searches the index table corresponding to the visual word dictionary of all pictures in the database based on the target visual words to obtain the result picture and/or the related information of the result picture;
and a sending module 405, where the server sends the result picture and/or the related information of the result picture to the client.
The image query system or the image search system automatically selects the visual word dictionary suitable for the query information type according to the combination type of the query information, converts the image into the visual word according to the visual word dictionary, further compresses the visual word into a data packet of the target visual word with less data volume, and then quickly and accurately acquires the result image of the target image to be retrieved and the related expansion information thereof according to the data packet.
The picture searching system effectively divides the database picture set according to the picture division criteria in the process of obtaining the target picture, so that the types of the visual words of various divided pictures are far smaller than the visual types of the original database picture set, the number of the visual words describing the pictures is effectively reduced, the target picture is converted into a data packet of the target visual words with dozens of bits, the data amount transmitted to the service end is reduced, the low-bit transmission between the client and the service end is achieved, the problem of long data transmission time under the current bandwidth limitation can be effectively solved, and the waiting time of a user can be better saved. The searching method is suitable for different types of queries and has strong expandability.
The client mentioned in this embodiment may be a mobile terminal, such as a mobile phone, an IPAD, a tablet computer, and the like.
Specifically, the client in this embodiment may include:
the receiving module is used for receiving query contents comprising a target picture to be queried or the target picture to be queried and related information;
the target visual word acquisition module is used for acquiring visual words of an internal target picture, selecting at least one target visual word dictionary corresponding to the query content from more than one visual word dictionary of the client according to a preset rule, and acquiring the target visual words of the visual words according to the target visual word dictionary;
the target visual word sending module is used for coding the target visual word and sending the coded target visual word to the server,
and the result picture receiving module is used for receiving and displaying the result picture and/or the related information of the result picture which is searched and sent by the server.
Each module displayed in the image search system is only for schematically displaying the internal structural relationship thereof, and it is possible to use the same module for transmission or reception many times in a certain system, client or other structure, or use the certain module at intervals, and the above embodiment is only for schematically illustrating, and is not limited to the structural arrangement relationship and the connection relationship in fig. 4. In addition, it may also appear that some other modules capable of implementing some steps in the picture searching method in the present invention are added in the picture searching system and the client.
Finally, it should be noted that: the order of each step in the above image searching method may be performed in parallel or in an alternative manner, and the above embodiment is only an illustrative example, and does not limit the execution order of the steps. In addition, the above embodiments are only used to illustrate the technical solution of the present invention and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. An image searching method, comprising:
the client receives query content, wherein the query content comprises a target picture to be queried or the target picture to be queried and related information;
the method comprises the steps that a client side obtains visual words of a target picture, searches a visual word dictionary library and a prediction loss function which are matched with the type of query content from one or more visual word dictionary libraries preset in the client side in advance according to the type of the query content, calculates the prediction loss value of each visual word dictionary in the visual word dictionary library to the visual words of the target picture by adopting the prediction loss function, obtains one or more visual word dictionaries within a threshold range, and obtains the target visual words of the visual words according to the target visual word dictionary;
and coding the target visual words and then sending the coded target visual words to a server so as to obtain result pictures matched with the query contents and/or relevant information of the result pictures.
2. The picture searching method according to claim 1, wherein the type of the query content includes:
the method comprises the steps that target pictures, target pictures and texts, target pictures and signals detected by a sensor are detected, and target picture and object identification software identifies object labels in the pictures;
the signal detected by the sensor comprises geographic position information detected by a global positioning system device, a bar code scanner is used for scanning a bar code of a book or a commodity, and an electronic tag reader is used for reading an electronic tag;
the object label identified by the object identification software comprises a human face identified by the human face identification software, and characters identified by the character identification system software.
3. The picture searching method according to claim 1, wherein:
one or more visual word dictionary libraries preset in advance by a client are obtained by the client from a server in advance, and the client updates the one or more visual word dictionary libraries at regular time;
the step that the server side establishes one or more visual word dictionary libraries comprises the following steps:
dividing the pictures in the server database into picture sets of various types by adopting a picture set dividing mode, establishing a visual word dictionary corresponding to each picture set, analyzing the visual word dictionary corresponding to each picture, and forming a visual word dictionary library by the set of the visual word dictionaries corresponding to the picture sets of various types if the visual word dictionary meets the establishment condition of the visual word dictionary library;
wherein: the visual word dictionary base is established under the conditions that:
the number of visual words in the visual word dictionary of each divided picture set is less than or equal to the total number of visual words in the visual word dictionary of the server database;
and counting the probability distribution of the visual words of the picture set, and calculating the entropy of the probability distribution of the visual words, wherein the information entropy of the probability distribution is less than a set threshold value.
4. The picture searching method according to claim 3, wherein:
the visual word dictionary is: establishing an original visual word dictionary of the picture by clustering visual features of the picture set; or,
and establishing a visual word dictionary of the picture by adopting a clustering mode for the visual features of the picture set, determining an effective visual word dictionary representing the original visual word dictionary based on the screening rule of the effective visual word dictionary, and taking the effective visual word dictionary as the visual word dictionary.
5. The picture searching method according to claim 3, wherein:
the method for dividing the pictures in the server database into the picture sets of various types by adopting a picture set dividing mode comprises the following steps:
dividing all pictures into a plurality of picture sets by using visual similarity among the pictures; or,
dividing all pictures into a plurality of picture sets by using the information related to the pictures; or
All pictures are divided into a plurality of sets using visual similarity between pictures and information about the pictures.
6. The picture searching method according to claim 5, wherein the information related to the picture comprises a photographing date of the picture, a text tag, and an electronic tag.
7. The picture searching method according to claim 1, wherein:
in the step of calculating the prediction loss value of each visual word dictionary in the visual word dictionary base on the visual word of the target picture by adopting the prediction loss function, the calculation mode of the prediction loss value is as follows:
the cosine distance between the visual words of the target picture and the class center of the picture class where the target visual word dictionary is located; or
The cosine distance between the visual words of the target picture and the class center of the picture class in which the target visual word dictionary is positioned, and the weighted sum of the Euclidean distance between the related information and the similar information of the picture class in which the visual word dictionary is positioned; or
The visual similarity distance of the target picture and the picture class where the visual word dictionary of the target visual word dictionary is located, and the product of the Euclidean distance of the related information and the same kind information of the picture class where the visual word dictionary is located.
8. The picture searching method according to claim 4,
the step of determining a visual word dictionary representative of the original visual word dictionary based on the screening rules of the valid visual word dictionary comprises:
selecting a certain number of pictures from a certain class of pictures as sample pictures, and converting the characteristics of the sample pictures into visual words in the original visual word dictionary;
inquiring in a visual word index table of the original visual word dictionary according to the visual words of the sample picture to obtain an original inquiry result;
combining any visual words belonging to an original visual word dictionary to form a screening visual word dictionary, converting the characteristics of the sample picture into first visual words corresponding to the screening visual word dictionary based on the screening visual word dictionary, and inquiring in a visual word index table of the original visual word dictionary by adopting the first visual words to obtain a first inquiry result corresponding to the screening visual word dictionary;
analyzing the original query results of all sample pictures and the first query result, and if the first query result is consistent with the original query result, adopting the current screening visual word dictionary as a visual word dictionary; otherwise, selecting a visual word from the original visual word dictionary, adding the visual word to the current screening visual word dictionary, and returning to the step of obtaining the first query result.
9. The picture searching method according to claim 1, further comprising:
the server receives the encoded target visual words and decodes the target visual words;
the server searches an index table corresponding to a visual word dictionary in the server on the basis of the target visual word to obtain a result picture and/or related information of the result picture, and sends the result picture and/or related information of the result picture to the client;
the visual word dictionary is: and the visual word dictionary is established by adopting a clustering mode on the visual features of all or part of pictures in the server side picture database.
CN 201110100485 2011-04-21 2011-04-21 Method and system for searching images Active CN102147815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110100485 CN102147815B (en) 2011-04-21 2011-04-21 Method and system for searching images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110100485 CN102147815B (en) 2011-04-21 2011-04-21 Method and system for searching images

Publications (2)

Publication Number Publication Date
CN102147815A CN102147815A (en) 2011-08-10
CN102147815B true CN102147815B (en) 2013-04-17

Family

ID=44422080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110100485 Active CN102147815B (en) 2011-04-21 2011-04-21 Method and system for searching images

Country Status (1)

Country Link
CN (1) CN102147815B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065150A (en) * 2011-10-24 2013-04-24 康佳集团股份有限公司 Scene recognition method based on intelligent mobile terminal
KR101191223B1 (en) * 2011-11-16 2012-10-15 (주)올라웍스 Method, apparatus and computer-readable recording medium by for retrieving image
CN102595138B (en) * 2012-02-29 2014-04-23 北京大学 Method, device and terminal for image compression
CN102799614B (en) * 2012-06-14 2015-01-07 北京大学 Image search method based on space symbiosis of visual words
CN102902771A (en) * 2012-09-27 2013-01-30 百度国际科技(深圳)有限公司 Method, device and server for searching pictures
CN103294779A (en) * 2013-05-13 2013-09-11 北京百度网讯科技有限公司 Method and device for acquiring object information
CN104143105A (en) * 2013-09-22 2014-11-12 腾讯科技(深圳)有限公司 Graph recognition method, device and system and terminal device
CN104714962B (en) * 2013-12-13 2018-11-06 阿里巴巴集团控股有限公司 A kind of generation method and system of image search engine
CN104731784B (en) * 2013-12-18 2019-03-26 中兴通讯股份有限公司 Visual search method, system and mobile terminal
CN104850537B (en) * 2014-02-17 2017-12-15 腾讯科技(深圳)有限公司 The method and device screened to content of text
CN103870597B (en) * 2014-04-01 2018-03-16 北京奇虎科技有限公司 A kind of searching method and device of no-watermark picture
CN104298707B (en) * 2014-09-01 2019-01-15 联想(北京)有限公司 A kind of information processing method and electronic equipment
CN105989001B (en) * 2015-01-27 2019-09-06 北京大学 Image search method and device, image search system
CN106407483A (en) * 2016-12-07 2017-02-15 连惠城 Electronic photo album with text search function
CN106886933A (en) * 2016-12-30 2017-06-23 深圳天珑无线科技有限公司 The methods of exhibiting and system of exhibition high in the clouds numerical digit commodity catalog
CN108287833A (en) * 2017-01-09 2018-07-17 北京艺鉴通科技有限公司 It is a kind of for the art work identification to scheme to search drawing method
CN107861970A (en) * 2017-09-15 2018-03-30 广州唯品会研究院有限公司 A kind of commodity picture searching method and device
CN109241314A (en) * 2018-08-27 2019-01-18 维沃移动通信有限公司 A kind of selection method and device of similar image
CN110879849B (en) * 2019-11-09 2022-09-20 广东智媒云图科技股份有限公司 Similarity comparison method and device based on image-to-character conversion

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008294973A (en) * 2007-05-28 2008-12-04 Oki Electric Ind Co Ltd Video image editing apparatus and method
CN101777064A (en) * 2009-01-12 2010-07-14 鸿富锦精密工业(深圳)有限公司 Image searching system and method
CN101944091A (en) * 2009-07-07 2011-01-12 夏普株式会社 Image retrieving device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008294973A (en) * 2007-05-28 2008-12-04 Oki Electric Ind Co Ltd Video image editing apparatus and method
CN101777064A (en) * 2009-01-12 2010-07-14 鸿富锦精密工业(深圳)有限公司 Image searching system and method
CN101944091A (en) * 2009-07-07 2011-01-12 夏普株式会社 Image retrieving device

Also Published As

Publication number Publication date
CN102147815A (en) 2011-08-10

Similar Documents

Publication Publication Date Title
CN102147815B (en) Method and system for searching images
US11886489B2 (en) System and method of identifying visual objects
CN105517679B (en) Determination of the geographic location of a user
KR101565265B1 (en) Coding of feature location information
US20090083275A1 (en) Method, Apparatus and Computer Program Product for Performing a Visual Search Using Grid-Based Feature Organization
CN102063472B (en) Image searching method and system, client side and server
CN110782284A (en) Information pushing method and device and readable storage medium
JP5563494B2 (en) Corresponding reference image search device and method, content superimposing device, system and method, and computer program
CN110083762B (en) Room source searching method, device and equipment and computer readable storage medium
CN111382620B (en) Video tag adding method, computer storage medium and electronic device
Chen et al. Memory-efficient image databases for mobile visual search
CN113657087B (en) Information matching method and device
CN111881777B (en) Video processing method and device
WO2013115203A1 (en) Information processing system, information processing method, information processing device, and control method and control program therefor, and communication terminal, and control method and control program therefor
JP6042778B2 (en) Retrieval device, system, program and method using binary local feature vector based on image
CN117009599A (en) Data retrieval method and device, processor and electronic equipment
CN107870923B (en) Image retrieval method and device
Zhang et al. Interactive mobile visual search for social activities completion using query image contextual model
KR101910825B1 (en) Method, apparatus, system and computer program for providing aimage retrieval model
KR20160052316A (en) Apparatus and Method for Web Data based Identification System for Object Identification Performance Enhancement
KR102558086B1 (en) System for providing gps based plant exploration guidance service using multimedia contents
KR20150073409A (en) Apparatus and method for near duplicate video clip detection
CN117938951B (en) Information pushing method, device, computer equipment and storage medium
JP7181014B2 (en) Data extraction device, data extraction method, and program
US20240168992A1 (en) Image retrieval method and apparatus, electronic device, and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant