CN103345496A - Multimedia information searching method and system - Google Patents

Multimedia information searching method and system Download PDF

Info

Publication number
CN103345496A
CN103345496A CN2013102642253A CN201310264225A CN103345496A CN 103345496 A CN103345496 A CN 103345496A CN 2013102642253 A CN2013102642253 A CN 2013102642253A CN 201310264225 A CN201310264225 A CN 201310264225A CN 103345496 A CN103345496 A CN 103345496A
Authority
CN
China
Prior art keywords
bit vector
tag bit
vectorial
subvector
multimedia messages
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013102642253A
Other languages
Chinese (zh)
Other versions
CN103345496B (en
Inventor
刘洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sina Technology China Co Ltd
Original Assignee
Sina Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sina Technology China Co Ltd filed Critical Sina Technology China Co Ltd
Priority to CN201310264225.3A priority Critical patent/CN103345496B/en
Publication of CN103345496A publication Critical patent/CN103345496A/en
Application granted granted Critical
Publication of CN103345496B publication Critical patent/CN103345496B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multimedia information searching method and system. The method includes the steps of extracting feature data of current multimedia information, obtaining a feature bit vector of the current multimedia information according to the extracted feature data, segmenting the feature bit vector of the current multimedia information to obtain k sub-vectors of the current multimedia information, determining a candidate set corresponding to each sub-vector according to each sub-vector of the current multimedia information, finding out a feature bit vector, corresponding to each vector identification in each obtained candidate set, in a multimedia feature database, calculating hamming distances between the feature bit vector of the current multimedia information and the found feature bit vectors, and outputting the multimedia information, corresponding to the feature bit vectors with the hamming distances conforming to a preset condition, as a search result. Through the method and due to the fact that a segmentation index structure is set up to search the feature bit vectors, the searching speed of the multimedia information and the searching efficiency of the multimedia information can be greatly improved.

Description

The multimedia information retrieval method and system
Technical field
The present invention relates to computer realm, relate in particular to a kind of multimedia information retrieval method and system.
Background technology
In recent years, along with multimedia technology and fast development of computer technology, large-scale multimedia messages appear at numerous research and application more and more.In order the information that comprises in these numerous and jumbled data to be accessed visit effectively and to utilize, traditional text based retrieval technique can't satisfy the growing demand of user, and the content-based retrieval technology is just arisen at the historic moment.
The content-based retrieval method need extract multimedia characteristic earlier and set up property data base, will be the neighbour's retrieval to characteristic then to the retrieval conversion of multimedia messages.For large scale multimedia information, its characteristic also is large-scale.This just need have the suitable indexing means corresponding with characteristic to come tissue characteristic data, accelerates the speed of retrieval.
Yet, the characteristic of multimedia messages is the vector data of higher-dimension (abbreviation high dimension vector) often, traditional index mechanism that is adapted to low dimension data is difficult to be adapted to the requirement of information retrieval based on contents, and this is the index dimension disaster phenomenon of usually said high dimensional data just.In order to reduce the influence of index dimension disaster, better realize the high dimensional data index, thereby improve the retrieval performance of multimedia messages, at present in the field of study, usually adopt hash method that high dimension vector is mapped to discrete bit vectors, this can save the storage consumption and the similar seek rate of raising of high dimension vector greatly.
When utilizing bit vectors to carry out multimedia information retrieval, at first need to set up the multimedia property data base, specifically comprise: at each multimedia in the data bank, extract this multimedia characteristic, adopt hash method to convert this multimedia characteristic to discrete n dimension bit vectors, be stored in the multimedia property data base as this multimedia tag bit vector.
Prior art at first needs the tag bit vector in the existing multimedia property data base is gathered division and ordering when multimedia messages is retrieved, and sets up ordered list, and its particular flow sheet specifically comprises the steps: as shown in Figure 1
S101: to the tag bit vector in the existing multimedia property data base according to preceding p(p<n) individual element is gathered division.
Particularly, preceding p identical tag bit vector of element in the multimedia property data base is divided in the identity set.
S102: at each set, determine the ordered list of this set.
Particularly, at each set, tag bit vector in this set is sorted according to the binary numeral size that the binary numeral of last tag bit vector is not more than back one tag bit vector, each tag bit vector after the ordering is constituted the ordered list of this set.
According to existing multimedia property data base and ordered list, prior art is based on the multimedia retrieval method of bit vectors, and its particular flow sheet specifically comprises the steps: as shown in Figure 2
S201: extract the characteristic of current multimedia messages, adopt hash method to convert thereof into discrete n dimension bit vectors, obtain the tag bit vector of current multimedia messages.
S202: determine the set at its place according to preceding p element of the tag bit vector of current multimedia messages, in this set, utilize ordered list to search and the Hamming distance of the tag bit vector of the current multimedia messages tag bit vector smaller or equal to q.
Particularly, the tag bit vector of current multimedia messages is divided in the set under the tag bit vector identical with its preceding p element, in this set, utilize ordered list search and the different element numbers of tag bit vector of current multimedia messages smaller or equal to the bit vectors of q.In fact, if each element of bit vectors is independent, the similarity between bit vectors generally can be measured with Hamming distance, and the Hamming distance between bit vectors can be expressed as the different element number of correspondence position bit value between two isometric bit vectors of comparing.
S203: with the above-mentioned corresponding multimedia of tag bit vector that finds, as final result for retrieval output.
Yet the present inventor finds, when the speed that above-mentioned multimedia information retrieval method is retrieved still can not satisfy a large amount of Search Requirement of systems face to the requirement of retrieval rate; Therefore, be necessary to provide the multimedia information retrieval that a kind of speed is faster, efficient is higher method.
Summary of the invention
Defective at above-mentioned prior art exists the invention provides a kind of multimedia information retrieval method and system, in order to improve speed and the efficient to multimedia information retrieval.
According to an aspect of the present invention, provide a kind of multimedia information retrieval method, having comprised:
Extract the characteristic of current multimedia messages, the characteristic of extracting is converted to the tag bit vector after, it is evenly cut apart, obtain k subvector, the i group element after wherein i subvector evenly cut apart by described tag bit vector is formed; I is the natural number of 1~k;
Determine the candidate collection of each subvector of corresponding described current multimedia messages respectively, wherein, at i subvector, detailed process comprises: in the indexed set of predetermined i index structure, find out and this i index that subvector is identical, and with the set of the corresponding vectorial of the index that finds out as to candidate collection that should i subvector; Wherein, in i the index structure, the vectorial of the tag bit vector of the multimedia messages to be retrieved that i subvector is identical is stored in the same vectorial set, and the index of this vectorial set is this i subvector;
For each vectorial in the candidate collection that obtains, in the multimedia property data base, find out the characteristic of correspondence bit vectors respectively; And the Hamming distance between the tag bit vector that calculates described current multimedia messages and the tag bit vector that finds, Hamming distance is met the vectorial corresponding multimedia messages of the tag bit that imposes a condition export as result for retrieval.
Preferably, definite method of i index structure comprises:
At each multimedia messages to be retrieved, the tag bit vector of this multimedia messages to be retrieved is evenly cut apart, obtain k subvector of this multimedia messages to be retrieved; Wherein, the i group element after i subvector of this multimedia messages to be retrieved cut apart by described tag bit vector is formed; I is the natural number of 1~k;
The vectorial of the tag bit vector of the multimedia messages to be retrieved that i subvector is identical is divided in the same vectorial set; And with i the identical subvector in the corresponding tag bit vector of vectorial in this vectorial set, as the index of this vectorial set, and store in the indexed set of i index structure.
Preferably, described for each vectorial in the candidate collection that obtains, in the multimedia property data base, find out the characteristic of correspondence bit vectors respectively, specifically comprise:
After the candidate collection that obtains carried out union operation, obtain the candidate and merge set;
Merge each vectorial in the set for described candidate, in described multimedia property data base, find out tag bit vector that should vectorial.
Preferably, described Hamming distance meets the tag bit vector that imposes a condition and is specially: with the Hamming distance of the tag bit vector of the described current multimedia messages tag bit vector smaller or equal to q, wherein, described q is smaller or equal to k.
Preferably, i index structure is specially key/value Key/Value form structure; Wherein, described i identical subvector be as Key, and the vectorial set conduct of corresponding described identical i subvector is to Value that should Key.
According to another aspect of the present invention, also provide a kind of multimedia information retrieval system, having comprised:
Tag bit vector determination module for the characteristic of extracting current multimedia messages, obtains the tag bit vector of described current multimedia messages according to the characteristic of extracting;
The tag bit vector is cut apart module, be used for the tag bit vector that described tag bit vector determination module obtains is evenly cut apart, obtain k subvector of described current multimedia messages, the i group element after wherein i subvector cut apart by described tag bit vector is formed; I is the natural number of 1~k;
The candidate collection determination module is used for cutting apart at described tag bit vector each subvector of the current multimedia messages that module obtains, determines respectively candidate collection that should subvector; Wherein, determine that at i subvector its corresponding candidate collection detailed process comprises: in the indexed set of predetermined i index structure, find out and this i index that subvector is identical, and with the set of the corresponding vectorial of the index that finds out as to candidate collection that should i subvector; Wherein, in i the index structure, the vectorial of the tag bit vector of the multimedia messages to be retrieved that i subvector is identical is stored in the same vectorial set, and the index of this vectorial set is this i subvector;
Tag bit vector search module, each vectorial for the candidate collection that obtains for described candidate collection determination module finds out the characteristic of correspondence bit vectors respectively in the multimedia property data base;
The Hamming distance computing module is for the Hamming distance between the tag bit vector that tag bit is vectorial and described tag bit vector search module searches arrives of the described current multimedia messages of calculating;
The result for retrieval output module is used for the Hamming distance according to described Hamming distance computing module calculating, Hamming distance is met the corresponding multimedia messages of tag bit vector that imposes a condition export as result for retrieval.
Preferably, described multimedia information retrieval system also comprises:
Index structure makes up module, be used for making up k index structure, wherein i index structure is to adopt following method to make up: at each multimedia messages to be retrieved, the tag bit vector of this multimedia messages to be retrieved is evenly cut apart, obtained k subvector of this multimedia messages to be retrieved; Wherein, i subvector of this multimedia messages to be retrieved is made up of the i group element in the tag bit vector of this multimedia messages to be retrieved; I is the natural number of 1~k; The vectorial of the tag bit vector of the multimedia messages to be retrieved that i subvector is identical is divided in the same vectorial set; And with i the identical subvector in the corresponding tag bit vector of vectorial in this vectorial set, as the index of this vectorial set, and store in the indexed set of i index structure.
Preferably, described index structure structure module specifically comprises:
Tag bit vector cutting unit is used at each multimedia messages to be retrieved, and the tag bit vector of this multimedia messages to be retrieved is cut apart, and obtains k subvector of this multimedia messages to be retrieved;
Vectorial set division unit is used for when making up i index structure, and the vectorial of the tag bit vector of the multimedia messages to be retrieved that i subvector is identical is divided into during same vectorial gathers;
The unit set up in index, be used for when making up i index structure, gather for the vectorial that described vectorial set division unit marks off, with i the identical subvector in the corresponding tag bit vector of vectorial in this vectorial set, as to should vectorial the subvector of set, and store as the index in i the index structure.
Preferably, described tag bit vector search module specifically comprises:
Candidate collection merge cells, the candidate collection that described candidate collection determination module is obtained obtain the candidate and merge set after carrying out the union operation;
The vector search unit is used for merging for described candidate each vectorial of set, finds out in described multimedia property data base tag bit vector that should vectorial.
Preferably, described Hamming distance meets the tag bit vector that imposes a condition and is specially:
With the Hamming distance of the tag bit vector of the described current multimedia messages tag bit vector smaller or equal to q, wherein, described q is smaller or equal to k.
In the technical scheme of the present invention, set up k segmented index structure by treating retrieving multimedia information, search the vector element number in each subvector of m(in the tag bit vector with current multimedia messages) tag bit vector that individual element is identical participates in Hamming distance and calculates, p the identical tag bit vector of element participates in Hamming distance calculating before searching in the tag bit vector with current multimedia messages than available technology adopting piecemeal ordered list, the present invention program can significantly reduce the tag bit vector number that participates in Hamming distance calculating, thereby significantly reduced the calculated amount in the primary retrieval process, reached the purpose that improves retrieval rate and efficient.
Further, in the search method of the present invention, for the sustainability of retrieving, only need be according to each subvector of current multimedia messages, the vectorial of the tag bit vector of current multimedia messages is divided in the corresponding vectorial set in k the index structure, the calculated amount that this calculated amount compares, sorts much smaller than the tag bit vector with magnanimity in tag bit vector and the ordered list of prior art, thus retrieval rate, recall precision improved greatly.
Description of drawings
Fig. 1 makes up the method flow diagram of ordered list for prior art;
Fig. 2 is the method flow diagram of the multimedia information retrieval of prior art;
Fig. 3 is the method flow diagram of the structure segmented index of the embodiment of the invention;
Fig. 4 is the method flow diagram of the multimedia information retrieval of the embodiment of the invention;
Fig. 5 is the synoptic diagram of the multimedia information retrieval system of the embodiment of the invention;
Fig. 6 is the inner structure block diagram of the tag bit vector search module of the embodiment of the invention;
Fig. 7 is the inner structure block diagram that the index structure of the embodiment of the invention makes up module.
Embodiment
Below with reference to accompanying drawing technical scheme of the present invention is carried out clear, complete description, obviously, described embodiment only is a part of embodiment of the present invention, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills are resulting all other embodiment under the prerequisite of not making creative work, all belong to the scope that the present invention protects.
Thinking of the present invention is, set up k segmented index structure in advance, when current multimedia messages is retrieved, utilizing k segmented index structure to search m(in the tag bit vector with current multimedia messages such as 32) tag bit vector that individual element is identical participates in Hamming distance and calculates, in prior art, participate in preceding p(that Hamming distance calculates such as 3) tag bit vector that individual element is identical, dwindled greatly and participated in the tag bit vector number that Hamming distance is calculated, thereby significantly reduced the calculated amount in the primary retrieval process, reached the purpose that improves retrieval rate and efficient.
Describe technical scheme of the present invention in detail below in conjunction with accompanying drawing.The specific embodiment of the invention is with in multimedia n dimensional feature bit vectors to be retrieved, search and the be characterized as example of current multimedia messages tag bit vector Hamming distance smaller or equal to k, introduction is based on the method for the multimedia information retrieval of segmented index thought design, at first need to set up multimedia property data base and segmented index, particular flow sheet specifically comprises the steps: as shown in Figure 3
S301: at each multimedia messages to be retrieved in the data bank, extract the characteristic of this multimedia messages to be retrieved, adopt hash method to convert this characteristic to discrete n dimension bit vectors, obtain the tag bit vector of each multimedia messages to be retrieved.
S302: tag bit vector and the vectorial thereof of each multimedia messages to be retrieved are stored in the multimedia property data base.
Particularly, unique vectorial of tag bit vector distribution for each multimedia messages to be retrieved, then with this vectorial as the Key(key), with with this vectorial characteristic of correspondence bit vectors as the Value(value), with Key/Value(key/value) form be stored in the multimedia property data base so that back inquiry and coupling.
S303: at each multimedia messages to be retrieved, the tag bit vector of this multimedia messages to be retrieved is evenly cut apart, set up segmented index, obtain k index structure.
Particularly, the tag bit vector of each multimedia messages to be retrieved is evenly cut apart, obtained k subvector of this multimedia messages to be retrieved; Wherein, the i group element after i subvector of this multimedia messages to be retrieved cut apart by the tag bit vector of this multimedia messages to be retrieved is formed; I group element after the tag bit vector of this multimedia messages to be retrieved is cut apart specifically comprises (i-1) * m+1~i * m element in the tag bit vector; Wherein, i is any one natural number among 1~k, and m is the vector element number in each subvector (or every group element);
In setting up the segmented index process, i index structure in k index structure obtains according to following method: in i index structure, the vectorial of the tag bit vector of the multimedia messages to be retrieved that i subvector is identical is stored in the same vectorial set, and the index of this vectorial set is this i subvector; And with i the identical subvector in the corresponding tag bit vector of vectorial in this vectorial set, as the index of this vectorial set, and store in the indexed set of i index structure.Particularly, the vectorial of the tag bit vector of the multimedia messages to be retrieved that i subvector is identical is divided in the same vectorial set; And with i the identical subvector in the corresponding tag bit vector of vectorial in the set of this vectorial as Key, the vectorial set conduct of corresponding above-mentioned identical i subvector is to Value that should Key, thereby constructs i index structure in the segmented index of Key/Value structure.Wherein, i can be any one natural number among 1~k.
Behind the k that has made up an above-mentioned segmented index index structure, based on this segmented index, the multimedia information retrieval method that the embodiment of the invention provides, particular flow sheet specifically comprises the steps: as shown in Figure 4
S401: extract the characteristic of current multimedia messages, adopt hash method to convert the characteristic of current multimedia messages to discrete n dimension bit vectors, obtain the tag bit vector of current multimedia messages.
S402: the tag bit vector of described current multimedia messages is evenly cut apart, obtained k subvector of described current multimedia messages.
Particularly, i group element after i subvector of described current multimedia messages evenly cut apart by the tag bit vector of described current multimedia messages is formed, and wherein the i group element specifically comprises (the i-1) * m+1 element~the i * m element in the tag bit vector of current multimedia messages; Wherein i is the natural number of 1~k, and m is the vector element number in each subvector (or every group element);
S403: at each subvector of described current multimedia messages, determine respectively candidate collection that should subvector.
Particularly, at each subvector of current multimedia messages, determine corresponding candidate collection respectively, thereby determine k candidate collection; Wherein, in the process of the candidate collection of i subvector determining corresponding current multimedia messages, i subvector for described current multimedia messages, its corresponding candidate collection is determined according to following method: find out i the index that subvector is identical with this multimedia messages to be retrieved in the indexed set of i index structure, and the corresponding vectorial of the index that finds out is gathered candidate collection as i subvector of the described current multimedia messages of correspondence.
S404: for each vectorial in the candidate collection that obtains, in the multimedia property data base, find out the characteristic of correspondence bit vectors respectively.
Particularly, for the candidate collection of each subvector of the corresponding current multimedia messages that obtains among the above-mentioned steps S403, i.e. k candidate collection finds out the tag bit vector of each vectorial in the corresponding candidate collection in the multimedia property data base.
As a kind of more excellent embodiment, consider the vectorial that may have some repetitions in each candidate collection that obtains among the above-mentioned steps S403; Therefore, can earlier k the candidate collection that obtains be carried out union operation after, obtain the candidate and merge set; Merge each vectorial in the set for described candidate, in described multimedia property data base, find out tag bit vector that should vectorial.Particularly, merge each vectorial in the set as Key with the candidate, go to search in the multimedia feature database to Value that should Key, be and described each candidate vector sign characteristic of correspondence bit vectors.
S405: calculate the tag bit vector of described current multimedia messages and the tag bit vector that finds between Hamming distance.
S406: Hamming distance is met the corresponding multimedia messages of tag bit vector that imposes a condition export as result for retrieval.
Particularly, meeting the tag bit that imposes a condition to measuring can be: with the Hamming distance of the tag bit vector of the described current multimedia messages tag bit vector smaller or equal to q; More preferably, above-mentioned k is greater than q, and namely q can guarantee omission can not occur smaller or equal to k like this, and the vectorial that namely meets the tag bit vector that imposes a condition all is included in the candidate collection.
With in the tag bit vector of 128 multimedia messagess of tieing up to be retrieved, search with current multimedia messages tag bit vector Hamming distance smaller or equal to 3(q) be example, the present invention sets up 4(k=q+1 by treating retrieving multimedia information) segmented index of individual key value Key/Value structure, find 32(m in the tag bit vector with current multimedia messages) tag bit vector that individual element is identical, calculate the Hamming distance between the tag bit vector of the above-mentioned tag bit vector that finds and current multimedia messages again, the tag bit vector that participates in Hamming distance calculating this moment has 2 128-32+2=2 98Individual, and available technology adopting piecemeal ordered list search in the tag bit vector with current multimedia messages before p the tag bit vector that element is identical, calculate the Hamming distance between the tag bit vector of the above-mentioned tag bit vector that finds and current multimedia messages again, because p can not be a bigger number---cause omission quantity bigger as if the p value than senior general; Therefore, p is generally a less number, and for example smaller or equal to 3, the tag bit vector that participates in Hamming distance calculating this moment has 2 at least 128-3-1=2 124Individual; Though utilize ordered list can be convenient to the comparison of the bit of Hamming distance in calculating to a certain extent and since participate in the prior art multiple of the tag bit vector that Hamming distance calculates be participate in the technology of the present invention Hamming distance calculating the tag bit vector 2 26Doubly, the quantity gap is huge;
In fact, above-mentioned m is determined by k: m=n/k, and wherein, n is the dimension of tag bit vector, for the high dimensional feature bit vectors, n is generally the value more than 100; K then is a number bigger slightly than q; Usually, be to satisfy the retrieval requirement, those skilled in the art's Hamming distance q value usually are set to a less number, such as less than 3 or 4 number; Therefore, m is at least double figures usually, even bigger.And in the prior art for fear of the situation that too much omission occurs, the setting of p value can not be excessive, is less than usually or equals the q value;
Therefore, because the number of the tag bit vector that individual element the is identical 32(m that the participation Hamming distance is calculated among the present invention), the 3(p that will calculate much smaller than participation Hamming distance of the prior art) number of the tag bit vector that individual element is identical, thereby reduced the quantity of the tag bit vector of participation Hamming distance calculating greatly, to reduce operand, improve retrieval rate and efficient.
In addition, for the sustainability of retrieving, in the search method of prior art, after retrieving multimedia, also need the tag bit vector of current multimedia messages is inserted in the ordered list, retrieve in order to next as multimedia tag bit vector to be retrieved.And the tag bit vector is inserted into computation process in the ordered list, also be that calculated amount is very big;
And in the search method of the present invention, for the sustainability of retrieving, only need be according to each subvector of current multimedia messages, the vectorial of the tag bit vector of current multimedia messages is divided in the corresponding vectorial set in k the index structure, and tag bit vector and the vectorial thereof of current multimedia messages be inserted in the multimedia property data base, this calculated amount compares much smaller than the tag bit vector with magnanimity in tag bit vector and the ordered list of prior art, the calculated amount of ordering, thus retrieval rate improved greatly, recall precision.
The detailed process that the vectorial of above-mentioned tag bit vector with current multimedia messages is divided into the corresponding vectorial set in k the index structure can be: the vectorial of the tag bit vector of current multimedia messages is divided (insertion) in the above-mentioned k that an obtains candidate collection.
For in retrieving, as if i index structure in k the index structure, the i subvector that does not have current multimedia messages in its index, then the i subvector of current multimedia messages is stored as the index of i index structure, and create the vectorial set of the vectorial of the tag bit vector comprise current multimedia messages, with the vectorial set created to should index stores in i index structure.
If the data of described current multimedia messages are deleted, then tag bit vector and the vectorial thereof of current multimedia messages are deleted from the multimedia property data base, and with the vectorial deletion of the tag bit vector of the current multimedia messages in k the candidate collection in k the index structure.
Based on above-mentioned search method, a kind of multimedia information retrieval system that the embodiment of the invention provides as shown in Figure 5, comprising: indexing unit 501 and database index structure construction device 502;
Wherein, indexing unit 501 comprises: tag bit vector determination module 511, tag bit vector are cut apart module 512, candidate collection determination module 513, tag bit vector search module 514, Hamming distance computing module 515 and result for retrieval output module 516.
Tag bit vector determination module 511 is used for extracting the characteristic of current multimedia messages, obtains the tag bit vector of described current multimedia messages according to the characteristic of extracting.
The tag bit vector is cut apart module 512 and is used for the tag bit vector that tag bit vector determination module 511 obtains is evenly cut apart, obtain k subvector of described current multimedia messages, the i group element after wherein i subvector cut apart by described tag bit vector is formed; I is the natural number of 1~k;
Candidate collection determination module 513 is used for cutting apart at described tag bit vector each subvector of the current multimedia messages that module obtains, determines respectively candidate collection that should subvector; Wherein, determine that at i subvector its corresponding candidate collection detailed process comprises: in the indexed set of predetermined i index structure, find out and this i index that subvector is identical, and with the set of the corresponding vectorial of the index that finds out as to candidate collection that should i subvector; Wherein, in i the index structure, the vectorial of the tag bit vector of the multimedia messages to be retrieved that i subvector is identical is stored in the same vectorial set, and the index of this vectorial set is this i subvector.
Tag bit vector search module 514 finds out the characteristic of correspondence bit vectors respectively for each vectorial of the candidate collection that obtains for candidate collection determination module 513 in the multimedia property data base.
Hamming distance between the tag bit vector that Hamming distance computing module 515 is vectorial for the tag bit that calculates described current multimedia messages and tag bit vector search module 514 finds.
The Hamming distance that result for retrieval output module 516 is used for according to 515 calculating of Hamming distance computing module meets the corresponding multimedia of tag bit vector that imposes a condition with Hamming distance and exports as result for retrieval.
Wherein, database index structure construction device 502 comprises: the multimedia property data base sets up module 521 and index structure makes up module 522.
The multimedia property data base is set up tag bit vector and the vectorial thereof that module 521 is used for storage multimedia messages to be retrieved.
Index structure makes up module 522 and is used for making up k index structure, wherein i index structure is to adopt following method to make up: at each multimedia messages to be retrieved, the tag bit vector of this multimedia messages to be retrieved is evenly cut apart, obtained k subvector of this multimedia messages to be retrieved; Wherein, i subvector of this multimedia messages to be retrieved is made up of the i group element in the tag bit vector of this multimedia messages to be retrieved; I is the natural number of 1~k; The vectorial of the tag bit vector of the multimedia messages to be retrieved that i subvector is identical is divided in the same vectorial set; And with i the identical subvector in the corresponding tag bit vector of vectorial in this vectorial set, as the index of this vectorial set, and store in the indexed set of i index structure.
Inner structure block diagram in the above-mentioned tag bit vector search module 514 specifically comprises as shown in Figure 6: candidate collection merge cells 601 and vector search unit 602.
Candidate collection merge cells 601 obtains the candidate and merges set after carrying out the union operation for the candidate collection that candidate collection determination module 513 is obtained;
Vector search unit 602 is used for merging for described candidate each vectorial of set, finds out in described multimedia property data base tag bit vector that should vectorial.
Inner structure block diagram in the above-mentioned index structure structure module 522 specifically comprises as shown in Figure 7: unit 703 set up in tag bit vector cutting unit 701, vectorial set division unit 702, index.
Tag bit vector cutting unit 701 is evenly cut apart the tag bit vector of this multimedia messages to be retrieved at each multimedia messages to be retrieved, obtains k subvector of this multimedia messages to be retrieved;
Vectorial set division unit 702 is when making up i index structure, k subvector of each multimedia messages to be retrieved that obtains for tag bit vector cutting unit 701, the vectorial of the tag bit vector of the multimedia messages to be retrieved that i subvector is identical are divided into during same vectorial gathers;
Index is set up unit 703 when making up i index structure, gather for the vectorial that vectorial set division unit 702 marks off, with i the identical subvector in the corresponding tag bit vector of vectorial in this vectorial set, as to should vectorial the subvector of set, and store as the index in i the index structure.
In the technical scheme of the present invention, set up k segmented index structure by treating retrieving multimedia information, searching in the tag bit vector with current multimedia messages m the identical tag bit vector of element participates in Hamming distance and calculates, p the identical tag bit vector of element participates in Hamming distance calculating before searching in the tag bit vector with current multimedia messages than available technology adopting piecemeal ordered list, the present invention program can significantly reduce the tag bit vector number that participates in Hamming distance calculating, thereby significantly reduced the calculated amount in the primary retrieval process, reached the purpose that improves retrieval rate and efficient.
Further, in the search method of the present invention, for the sustainability of retrieving, only need be according to each subvector of current multimedia messages, the vectorial of the tag bit vector of current multimedia messages is divided in the corresponding vectorial set in k the index structure, the calculated amount that this calculated amount compares, sorts much smaller than the tag bit vector with magnanimity in tag bit vector and the ordered list of prior art, thus retrieval rate, recall precision improved greatly.
The above only is preferred implementation of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the principle of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (10)

1. a multimedia information retrieval method is characterized in that, comprising:
Extract the characteristic of current multimedia messages, the characteristic of extracting is converted to the tag bit vector after, it is evenly cut apart, obtain k subvector, the i group element after wherein i subvector evenly cut apart by described tag bit vector is formed; I is the natural number of 1~k;
Determine the candidate collection of each subvector of corresponding described current multimedia messages respectively, wherein, at i subvector, detailed process comprises: in the indexed set of predetermined i index structure, find out and this i index that subvector is identical, and with the set of the corresponding vectorial of the index that finds out as to candidate collection that should i subvector; Wherein, in i the index structure, the vectorial of the tag bit vector of the multimedia messages to be retrieved that i subvector is identical is stored in the same vectorial set, and the index of this vectorial set is this i subvector;
For each vectorial in the candidate collection that obtains, in the multimedia property data base, find out the characteristic of correspondence bit vectors respectively; And the Hamming distance between the tag bit vector that calculates described current multimedia messages and the tag bit vector that finds, Hamming distance is met the vectorial corresponding multimedia messages of the tag bit that imposes a condition export as result for retrieval.
2. the method for claim 1 is characterized in that, definite method of i index structure comprises:
At each multimedia messages to be retrieved, the tag bit vector of this multimedia messages to be retrieved is evenly cut apart, obtain k subvector of this multimedia messages to be retrieved; Wherein, the i group element after i subvector of this multimedia messages to be retrieved cut apart by described tag bit vector is formed; I is the natural number of 1~k;
The vectorial of the tag bit vector of the multimedia messages to be retrieved that i subvector is identical is divided in the same vectorial set; And with i the identical subvector in the corresponding tag bit vector of vectorial in this vectorial set, as the index of this vectorial set, and store in the indexed set of i index structure.
3. method as claimed in claim 2 is characterized in that, and is described for each vectorial in the candidate collection that obtains, and finds out the characteristic of correspondence bit vectors respectively in the multimedia property data base, specifically comprises:
After the candidate collection that obtains carried out union operation, obtain the candidate and merge set;
Merge each vectorial in the set for described candidate, in described multimedia property data base, find out tag bit vector that should vectorial.
4. as the arbitrary described method of claim 1-3, it is characterized in that, described Hamming distance meets the tag bit vector that imposes a condition and is specially: with the Hamming distance of the tag bit vector of the described current multimedia messages tag bit vector smaller or equal to q, wherein, described q is smaller or equal to k.
5. method as claimed in claim 4 is characterized in that, i index structure is specially key/value Key/Value form structure; Wherein, described i identical subvector be as Key, and the vectorial set conduct of corresponding described identical i subvector is to Value that should Key.
6. a multimedia information retrieval system is characterized in that, comprising:
Tag bit vector determination module for the characteristic of extracting current multimedia messages, obtains the tag bit vector of described current multimedia messages according to the characteristic of extracting;
The tag bit vector is cut apart module, be used for the tag bit vector that described tag bit vector determination module obtains is evenly cut apart, obtain k subvector of described current multimedia messages, the i group element after wherein i subvector cut apart by described tag bit vector is formed; I is the natural number of 1~k;
The candidate collection determination module is used for cutting apart at described tag bit vector each subvector of the current multimedia messages that module obtains, determines respectively candidate collection that should subvector; Wherein, determine that at i subvector its corresponding candidate collection detailed process comprises: in the indexed set of predetermined i index structure, find out and this i index that subvector is identical, and with the set of the corresponding vectorial of the index that finds out as to candidate collection that should i subvector; Wherein, in i the index structure, the vectorial of the tag bit vector of the multimedia messages to be retrieved that i subvector is identical is stored in the same vectorial set, and the index of this vectorial set is this i subvector;
Tag bit vector search module, each vectorial for the candidate collection that obtains for described candidate collection determination module finds out the characteristic of correspondence bit vectors respectively in the multimedia property data base;
The Hamming distance computing module is for the Hamming distance between the tag bit vector that tag bit is vectorial and described tag bit vector search module searches arrives of the described current multimedia messages of calculating;
The result for retrieval output module is used for the Hamming distance according to described Hamming distance computing module calculating, Hamming distance is met the corresponding multimedia messages of tag bit vector that imposes a condition export as result for retrieval.
7. system as claimed in claim 6 is characterized in that, also comprises:
Index structure makes up module, be used for making up k index structure, wherein i index structure is to adopt following method to make up: at each multimedia messages to be retrieved, the tag bit vector of this multimedia messages to be retrieved is evenly cut apart, obtained k subvector of this multimedia messages to be retrieved; Wherein, i subvector of this multimedia messages to be retrieved is made up of the i group element in the tag bit vector of this multimedia messages to be retrieved; I is the natural number of 1~k; The vectorial of the tag bit vector of the multimedia messages to be retrieved that i subvector is identical is divided in the same vectorial set; And with i the identical subvector in the corresponding tag bit vector of vectorial in this vectorial set, as the index of this vectorial set, and store in the indexed set of i index structure.
8. system as claimed in claim 7 is characterized in that, described index structure makes up module and specifically comprises:
Tag bit vector cutting unit is used at each multimedia messages to be retrieved, and the tag bit vector of this multimedia messages to be retrieved is cut apart, and obtains k subvector of this multimedia messages to be retrieved;
Vectorial set division unit is used for when making up i index structure, and the vectorial of the tag bit vector of the multimedia messages to be retrieved that i subvector is identical is divided into during same vectorial gathers;
The unit set up in index, be used for when making up i index structure, gather for the vectorial that described vectorial set division unit marks off, with i the identical subvector in the corresponding tag bit vector of vectorial in this vectorial set, as to should vectorial the subvector of set, and store as the index in i the index structure.
9. system as claimed in claim 7 is characterized in that, described tag bit vector search module specifically comprises:
Candidate collection merge cells, the candidate collection that described candidate collection determination module is obtained obtain the candidate and merge set after carrying out the union operation;
The vector search unit is used for merging for described candidate each vectorial of set, finds out in described multimedia property data base tag bit vector that should vectorial.
10. as the arbitrary described system of claim 6-9, it is characterized in that described Hamming distance meets the tag bit vector that imposes a condition and is specially:
With the Hamming distance of the tag bit vector of the described current multimedia messages tag bit vector smaller or equal to q, wherein, described q is smaller or equal to k.
CN201310264225.3A 2013-06-28 2013-06-28 multimedia information retrieval method and system Active CN103345496B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310264225.3A CN103345496B (en) 2013-06-28 2013-06-28 multimedia information retrieval method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310264225.3A CN103345496B (en) 2013-06-28 2013-06-28 multimedia information retrieval method and system

Publications (2)

Publication Number Publication Date
CN103345496A true CN103345496A (en) 2013-10-09
CN103345496B CN103345496B (en) 2016-12-28

Family

ID=49280291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310264225.3A Active CN103345496B (en) 2013-06-28 2013-06-28 multimedia information retrieval method and system

Country Status (1)

Country Link
CN (1) CN103345496B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104991959A (en) * 2015-07-21 2015-10-21 北京京东尚科信息技术有限公司 Method and system for retrieving same or similar image based on content
WO2015165037A1 (en) * 2014-04-29 2015-11-05 中国科学院自动化研究所 Cascaded binary coding based image matching method
CN105095435A (en) * 2015-07-23 2015-11-25 北京京东尚科信息技术有限公司 Similarity comparison method and device for high-dimensional image features
CN105959224A (en) * 2016-06-24 2016-09-21 西安电子科技大学 Bit vector-based high-speed routing lookup apparatus and method
CN106980656A (en) * 2017-03-10 2017-07-25 北京大学 A kind of searching method based on two-value code dictionary tree
CN108628892A (en) * 2017-03-21 2018-10-09 北京京东尚科信息技术有限公司 Method, apparatus, electronic equipment and the readable storage medium storing program for executing of ordered data storage
CN109753576A (en) * 2018-12-25 2019-05-14 上海七印信息科技有限公司 A kind of method for retrieving similar images
CN110149529A (en) * 2018-11-01 2019-08-20 腾讯科技(深圳)有限公司 Processing method, server and the storage medium of media information
TWI703459B (en) * 2019-07-25 2020-09-01 中華電信股份有限公司 Searching system and searching method for addressable index
CN111738194A (en) * 2020-06-29 2020-10-02 深圳力维智联技术有限公司 Evaluation method and device for similarity of face images
CN112445934A (en) * 2021-02-01 2021-03-05 北京远鉴信息技术有限公司 Voice retrieval method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5987456A (en) * 1997-10-28 1999-11-16 University Of Masschusetts Image retrieval by syntactic characterization of appearance
CN1477563A (en) * 2003-07-03 2004-02-25 复旦大学 High-dimensional vector data quick similar search method
CN102486800A (en) * 2010-12-01 2012-06-06 财团法人工业技术研究院 Video searching method, system and method for establishing video database

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5987456A (en) * 1997-10-28 1999-11-16 University Of Masschusetts Image retrieval by syntactic characterization of appearance
CN1477563A (en) * 2003-07-03 2004-02-25 复旦大学 High-dimensional vector data quick similar search method
CN102486800A (en) * 2010-12-01 2012-06-06 财团法人工业技术研究院 Video searching method, system and method for establishing video database

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵嵩 等: "基于子向量距离索引的高维图像特征匹配算法", 《计算机工程与应用》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015165037A1 (en) * 2014-04-29 2015-11-05 中国科学院自动化研究所 Cascaded binary coding based image matching method
CN104991959A (en) * 2015-07-21 2015-10-21 北京京东尚科信息技术有限公司 Method and system for retrieving same or similar image based on content
CN104991959B (en) * 2015-07-21 2019-11-05 北京京东尚科信息技术有限公司 A kind of method and system of the same or similar image of information retrieval based on contents
RU2686590C1 (en) * 2015-07-23 2019-04-29 Бэйцзин Цзиндун Шанкэ Информейшн Текнолоджи Ко, Лтд. Method and device for comparing similar elements of high-dimensional image features
CN105095435A (en) * 2015-07-23 2015-11-25 北京京东尚科信息技术有限公司 Similarity comparison method and device for high-dimensional image features
WO2017012491A1 (en) * 2015-07-23 2017-01-26 北京京东尚科信息技术有限公司 Similarity comparison method and apparatus for high-dimensional image features
US11048966B2 (en) 2015-07-23 2021-06-29 Beijing Jingdong Shangke Information Technology Co., Ltd. Method and device for comparing similarities of high dimensional features of images
JP2018527656A (en) * 2015-07-23 2018-09-20 ベイジン ジンドン シャンケ インフォメーション テクノロジー カンパニー リミテッド Method and device for comparing similarity of high-dimensional features of images
CN105959224A (en) * 2016-06-24 2016-09-21 西安电子科技大学 Bit vector-based high-speed routing lookup apparatus and method
CN105959224B (en) * 2016-06-24 2019-01-15 西安电子科技大学 High speed route lookup device and method based on bit vectors
CN106980656A (en) * 2017-03-10 2017-07-25 北京大学 A kind of searching method based on two-value code dictionary tree
WO2018161548A1 (en) * 2017-03-10 2018-09-13 北京大学 Search method based on binary code trie
CN108628892A (en) * 2017-03-21 2018-10-09 北京京东尚科信息技术有限公司 Method, apparatus, electronic equipment and the readable storage medium storing program for executing of ordered data storage
CN108628892B (en) * 2017-03-21 2020-11-20 北京京东尚科信息技术有限公司 Method and device for storing ordered data, electronic equipment and readable storage medium
CN110149529A (en) * 2018-11-01 2019-08-20 腾讯科技(深圳)有限公司 Processing method, server and the storage medium of media information
CN109753576A (en) * 2018-12-25 2019-05-14 上海七印信息科技有限公司 A kind of method for retrieving similar images
TWI703459B (en) * 2019-07-25 2020-09-01 中華電信股份有限公司 Searching system and searching method for addressable index
CN111738194A (en) * 2020-06-29 2020-10-02 深圳力维智联技术有限公司 Evaluation method and device for similarity of face images
CN111738194B (en) * 2020-06-29 2024-02-02 深圳力维智联技术有限公司 Method and device for evaluating similarity of face images
CN112445934A (en) * 2021-02-01 2021-03-05 北京远鉴信息技术有限公司 Voice retrieval method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN103345496B (en) 2016-12-28

Similar Documents

Publication Publication Date Title
CN103345496A (en) Multimedia information searching method and system
CN102831222B (en) Differential compression method based on data de-duplication
CN101963982B (en) Method for managing metadata of redundancy deletion and storage system based on location sensitive Hash
CN103049568B (en) The method of the document classification to magnanimity document library
CN103577418B (en) Magnanimity Document distribution formula retrieval re-scheduling system and method
CN105468642A (en) Data storage method and apparatus
CN106611035A (en) Retrieval algorithm for deleting repetitive data in cloud storage
CN103914483B (en) File memory method, device and file reading, device
CN103488709A (en) Method and system for building indexes and method and system for retrieving indexes
CN102147795A (en) Method and device for searching points of interest as well as navigation system
CN104731896A (en) Data processing method and system
CN104166651A (en) Data searching method and device based on integration of data objects in same classes
CN103902702A (en) Data storage system and data storage method
CN102169491B (en) Dynamic detection method for multi-data concentrated and repeated records
CN104008119B (en) A kind of one-to-many mixed characters string fusion comparison method
CN103970842A (en) Water conservancy big data access system and method for field of flood control and disaster reduction
CN100476824C (en) Method and system for storing element and method and system for searching element
CN101963977A (en) A search method and mobile terminal without urban search
CN105488176A (en) Data processing method and device
CN105224624A (en) A kind of method and apparatus realizing down the quick merger of row chain
CN102375863A (en) Method and device for keyword extraction in geographic information field
CN109308311A (en) A kind of multi-source heterogeneous data fusion system
CN105183792A (en) Distributed fast text classification method based on locality sensitive hashing
CN103440292A (en) Method and system for retrieving multimedia information based on bit vector
CN102722557B (en) Self-adaption identification method for identical data blocks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230418

Address after: Room 501-502, 5/F, Sina Headquarters Scientific Research Building, Block N-1 and N-2, Zhongguancun Software Park, Dongbei Wangxi Road, Haidian District, Beijing, 100193

Patentee after: Sina Technology (China) Co.,Ltd.

Address before: 100080, International Building, No. 58 West Fourth Ring Road, Haidian District, Beijing, 20 floor

Patentee before: Sina.com Technology (China) Co.,Ltd.