CN106649440A - Approximate repeated video retrieval method incorporating global R features - Google Patents

Approximate repeated video retrieval method incorporating global R features Download PDF

Info

Publication number
CN106649440A
CN106649440A CN201610820574.2A CN201610820574A CN106649440A CN 106649440 A CN106649440 A CN 106649440A CN 201610820574 A CN201610820574 A CN 201610820574A CN 106649440 A CN106649440 A CN 106649440A
Authority
CN
China
Prior art keywords
features
video
feature
bof
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610820574.2A
Other languages
Chinese (zh)
Other versions
CN106649440B (en
Inventor
廖开阳
王玮
郑元林
曹从军
赵凡
蔺广逢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN201610820574.2A priority Critical patent/CN106649440B/en
Publication of CN106649440A publication Critical patent/CN106649440A/en
Application granted granted Critical
Publication of CN106649440B publication Critical patent/CN106649440B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an approximate repeated video retrieval method incorporating global R features. The method comprises the steps of firstly implementing the extraction of local SIFT features of videos in a databases, based on the coordinate information of the local SIFT features, establishing a global R feature, using descriptor information of the local SIFT features to establish a BOF retrieval model, based on the BOF model, establishing a voting retrieval model, finally implementing an application information fusion strategy to fuse global geometric distribution information into the BOF model, and accurately retrieving approximate repeated video in large-scale data. The approximate repeated video retrieval method incorporating global R features fuses the global R feature can fuse the global geometric distribution information into the BOF model based on the information fusion strategy, and achieve accurate retrieval of the approximate repeated video in large-scale data.

Description

The approximate of amalgamation of global R features repeats video retrieval method
Technical field
The invention belongs to video analysis and search method technical field, and in particular to a kind of amalgamation of global R features it is approximate Repeat video retrieval method.
Background technology
With the communication technology, video capture device, Video editing software fast development, the quantity of Internet video is exponentially Level increases.Meanwhile, the related service of video, such as:Advertisement, video sharing, recommendation and monitoring, excite the interest of online user simultaneously And participate in the related activity of video, such as:Search, upload, download and comment etc..
Nowadays, can all there is substantial amounts of video to be uploaded and share in internet daily, exist almost repeat in a large number on the net Video.A large amount of approximate appearance for repeating video have expedited the emergence of many new applications, such as:Results for video is resequenced, copyright protection, Online Video uses monitoring, video labeling and video database cleaning etc..For example:One it is typical it may be the case that, one The user of website is look for some new videos, but the video that the final ranking result returned in search engine much repeats; It is another kind of it may be the case that, a video production person wishes the video of their copyright protection, to avoid sharing on the internet. Above both occasions are required for approximately repeating video retrieval technology to realize respective target.
In recent years, the approximate video frequency searching that repeats becomes the focus of research, and many researchers are studying this technology.Mesh Before, most of existing methods be typically with it is following it is approximate repeat video frame retrieval frame (R.Fernandez-Beltran, And F.Pla, " Latent topics-based relevance feedback for video retrieval, " Pattern Recognition, vol.51, pp.72-84, Mar, 2016.):First, calculated by shot boundary detector and sampling Method is decomposed into a series of key frame video;Secondly, to these key-frame extraction visual signatures, such as:Scale invariant feature (SIFT), local binary pattern (LBP) etc., whole video is represented with the visual signature sequence of key frame;Finally, system is needed The similitude between the video in each data set and inquiry video, and returned data are calculated according to visual signature sequence Concentrate the title with the most like video of inquiry video.Under normal circumstances, either time or spatial information can be used to assessment Similitude (M.Douze, H.Jegou, and C.Schmid, " An Image-Based Approach between two videos Video Copy Detection With Spatio-Temporal Post-Filtering, " Ieee Transactions On Multimedia, vol.12, no.4, pp.257-266, Jun, 2010.C.-L.Chou, H.-T.Chen, and S.- Y.Lee, " Pattern-Based Near-Duplicate Video Retrieval and Localization on Web- Scale Videos, " Ieee Transactions on Multimedia, vol.17, no.3, pp.382-395, Mar, 2015.).Additionally, also there are some existing methods, real-time retrieval, but this are realized to one global characteristics of whole video extraction The method of kind typically can not be used for that long-time video is carried out effectively to retrieve (X.Zhou, and L.Chen, " Structure Tensor Series-Based Large Scale Near-Duplicate Video Retrieval, " Multimedia, IEEE Transactions on, vol.14, no.4, pp.1220-1233,2012.).
In nearest some documents for occurring, the measurement that is also used to of the correlation in two videos between paired frame is regarded Similitude (J.Liu, Z.Huang, H.T.Shen, the and B.Cui, " Correlation-Based Retrieval for of frequency Heavily Changed Near-Duplicate Videos, " Acm Transactions on Information Systems, vol.29, no.4, Dec, 2011.).The nearest nearly literature review for repeating video retrieval technology may be referred to document (J.Liu, Z.Huang, H.Cai, H.T.Shen, N.Chong Wah, and W.Wang, " Near-Duplicate Video Retrieval:Current Research and Future Trends, " Acm Computing Surveys, vol.45, No.4, Aug, 2013.).
At present, the most nearly video retrieval method that repeats is all based on local feature and BOF retrieval models, but this A little methods only make use of single local grain information, the global information of characteristic point be have ignored, so as to cause the essence of video frequency searching Exactness is not high.
The content of the invention
It is an object of the invention to provide a kind of the approximate of amalgamation of global R features repeats video retrieval method, being capable of basis Information Fusion Policy is blended in global geometry distributed intelligence in BOF models, and realization is accurately retrieved in large-scale data It is approximate to repeat video.
The technical solution adopted in the present invention is, the approximate of amalgamation of global R features repeats video retrieval method, specifically according to Following steps are implemented:
Step 1, to video extraction local SIFT feature in database;
Step 2, the coordinate information foundation overall situation R features Jing after step 1, in the local SIFT feature for obtaining;
Step 3, treat after the completion of step 2, using information is accorded with described in the SIFT feature of local BOF characteristic models are set up;
Step 4, the BOF characteristic models obtained according to step 3, set up based on the ballot retrieval model of BOF;
Step 5, application message convergence strategy global geometry distributed intelligence be fused to that Jing steps 4 set up based on BOF's In ballot retrieval model, accurate retrieval is approximate in large-scale data repeats video.
Of the invention the characteristics of, also resides in:
Step 1 is specifically implemented in accordance with the following methods:
First to reference video storehouse in all videos carry out key-frame extraction, then SIFT is carried out to each key frame special Levy extraction.
The method that uniform sampling is used to key-frame extraction, and extracted a two field picture every 6 seconds;
It is to adopt D.G.Lowe that SIFT feature is extracted, " Distinctive image features from scale- Invariant keypoints, " International Journal of Computer Vision, vol.60, no.2, The method of pp.91-110, Nov, 2004., (using the method in document " unique Scale invariant characteristics of image ") is to key frame SIFT feature is extracted, its information extracted is included:The position of characteristic point, yardstick, angle and partial descriptions information.
Step 2 is specifically implemented in accordance with the following methods:
It is according to the SIFT for extracting to set up overall situation R features according to the coordinate information in the local SIFT feature that step 1 is obtained Positional information in feature, is become using improved Radon and brings extraction overall situation R features;
Radon conversion refers to that the straight line in a plane along different directions does line integral to function f, and the projection for obtaining is exactly The Radon conversion of function f;So the pixel of each non-zero on a discrete bianry image can be projected to one In Radon matrixes;
For piece image f (x, y), x, y are the coordinate of pixel in image, then image f (x, y) Jing Radon map tables It is shown as following form:
In formula (1):δ () is that Dirac delta function is also called unit impulse function, is equal in the point in addition to zero Zero, and its integration in whole domain of definition is equal to 1;θ is angle, and θ ∈ [0, π);ρ is polar diameter, and ρ ∈ (- ∞, ∞);
Improved Radon is converted and is repeated to be also called R changes in video retrieval method in the approximate of amalgamation of global R features of the present invention Change, be formulated as following form:
In formula (2),It is the Radon conversion of f (x, y);
Improved Radon conversion solves the problems, such as that former conversion does not possess yardstick, rotation and translation invariance;
Using (2D)2PCA principal component analysis algorithm carries out principal component analysis conversion and obtains phase to the matrix obtained from R conversion The low-dimensional matrix answered is used as final feature, referred to as R features;(2D)2PCA uses document " bidirectional two-dimensional principal component analysis Bidirectional two-dimensional principal component analytical method in application in efficient face representation with identification ", while in two sides of row and column Principal component analysis and calculating are carried out upwards, can so obtain the feature of higher accuracy of identification.
Step 3 is specifically implemented according to following steps:
Step 3.1, the descriptor in the SIFT feature in image library is instructed with large-scale data hierarchical clustering algorithm Practice, generate class;
Large-scale data hierarchical clustering algorithm is a kind of clustering algorithm;
Step 3.2, Jing after step 3.1, is quantified, and generates the BOF features of each image, and concrete grammar is as follows:
The BOF features for quantifying to generate each image refer to each characteristic point for judging image with which class center recently, most Near is then put into such center, will finally generate a row frequency table, i.e., preliminary haves no right BOF;Followed by tf-idf to frequency Number table adds weight, generates final weighting BOF features;
Wherein, quantization method is carried out to the feature of inquiry video as follows:
The q in formula (3):Represent and quantify, RdThe d dimension datas in real number space are represented, k represents the quantity at class center, xi,j,i =1 ..., m2For ith feature in jth frame in reference video storehouse;
The tf-idf weights methods calculated per frame are specific as follows:
Wi=tfi·idfi(6);
In formula (4)~formula (6):K represents the quantity at class center;fijIt is the visual vocabulary belonging to ith feature at j-th The frequency occurred in frame of video;niIt is the sum of the reference video frame comprising the visual vocabulary belonging to ith feature;N is total Reference video number;tfiRepresent word frequency factor;idfiRepresent inverse word frequency factor;
Step 3.3, the BOF features to generating set up inverted index, and concrete grammar is:
Inverted index is typically made up of quantization list file and inverted list file two parts;
Quantify list file and have recorded all vocabulary occurred in document sets (image, frame of video);
Inverted list file is that the information such as the position by each vocabulary in log file (image, frame of video) and frequency are remembered Record is got off, and these information of all vocabulary just constitute inverted list;For n vocabulary (feature) w quantified in list file1…wn In a wi, in m log file (image, frame of video) d1…dmIn inverted list can be expressed as form:
The such record of n bars can constitute a complete inverted list;
In formula (7), fiRepresent frequency, direction and dimensional information;
Formula (7) gives a complete inverted index structure for query text vocabulary.
Step 4 is specifically implemented according to following steps:
Given inquiry frame, is represented with local feature y, and all of key frame local feature in video database xj, j=1 ..., n is represented, based on specific as follows the step of BOF ballot retrievals:
Step 4.1, for inquiry frame local feature yl, l=1 ..., m1With all of key frame in video database Local feature xi,j, i=1 ..., m2, the similarity scores s between j=1 ..., n two frame of video of calculatingj, its algorithm is concrete It is as follows:
In formula (8):F is an adaptation function, and it reflects two features xi,jAnd ylBetween degree of similarity;
Step 4.2, Jing after step 4.1, feature is quantified according to visual vocabulary, and video in database after quantifying Characteristic storage in an inverted file, this quantizing process q uses formula (3);
Q (x after quantizationi,j) result be and feature xi,jThe sequence number at nearest class center (visual vocabulary);Therefore, if two Feature xi,jAnd ylQ (x are met after quantizationi,j)=q (yl), then the two features in the feature space of higher-dimension very close to probability It is very high;Around this principle, it is contemplated that aforesaid tf-idf methods of weighting, adaptation function f is then defined as algorithm:
Then two different features can efficiently be compared according to the result after quantization;
Step 4.3, Jing after step 4.2, be ultimately used to sort image similarity score sfIt is to sjAfter being post-processed Arrive, specifically implement by following algorithm:
Can be seen that by the formula (10) in the formula (9) and step 4.3 in step 4.2:Consider simultaneously inquiry frame of video and The tf-idf weights of the vision word of key frame in database, and both be added to based on BOF ballot search method in, this Plant method of weighting has carried out normalization to vision word histogram.
Step 5 is specifically implemented in accordance with the following methods:
If two features x and y quantify Euclidean distance d for reflecting the two feature descriptors in the heart in same class (x, y) is very little, then the distance between network in the Euclidean space described by R features is also very little;Based on this point, By q (x) and b (x), q is a quantizer to one descriptor, and b is R features;Then R features are embedded in BOF retrieval models, weight The function of the adaptation function f of new definition, specific algorithm is as follows:
In formula (11):D represents Euclidean distance;htRepresent a thresholding;
When quantifying, calculation will take smaller value in class, allow close video to match as far as possible, and htAlso it is corresponding Get the small value, here ht=0.005, in order to the video of error hiding can be removed according to the distance of R features.
The beneficial effects of the present invention is:
(1) the approximate of amalgamation of global R features of the present invention repeats to propose a kind of improved Radon changes in video retrieval method Change, solve the problems, such as that former conversion does not possess yardstick, rotation, translation invariance, improve the robustness of global characteristics.
(2) repeat in video retrieval method in the approximate of amalgamation of global R features of the present invention, can be according to information Fusion Policy handle Global geometry distributed intelligence is blended in BOF models, increased the global property of BOF models, so as to improve stablizing for system Property.
(3) when the approximate repetition video retrieval method of amalgamation of global R features of the present invention is used, nearly palinopsia can be greatly improved The precision of frequency retrieval, and field of video retrieval can be widely used in.
(4) search method of amalgamation of global R features of the present invention is also suitable for field of image search, can greatly improve image inspection The precision of rope.
Description of the drawings
Fig. 1 is the approximate frame diagram for repeating video retrieval method of amalgamation of global R features of the present invention.
Specific embodiment
With reference to the accompanying drawings and detailed description the present invention is described in detail.
The approximate frame diagram for repeating video retrieval method of amalgamation of global R features, it is as shown in Figure 1, two can be classified as Major part, respectively:Offline part and online part.The process object of offline part is target video storehouse, produces online part Required inverted index table during inquiry;Online part mainly completes the inquiry to inquiring about video in target video storehouse Journey.
The process object of offline part is reference video storehouse, the video in reference video storehouse is carried out key-frame extraction, SIFT feature extraction, R feature extractions, feature clustering analysis, the quantization of characteristic vector to visual vocabulary simultaneously generate visual vocabulary table With the inverted index table with regard to feature for the inquiry of online part.
The inquiry to inquiring about video in reference video storehouse is partially completed online;Online part carries out key to inquiring about video Frame extraction, SIFT feature extraction, R feature extractions, according to reference video storehouse generate visual vocabulary table Online Video is owned Characteristic quantification in key frame is fused to BOF into visual vocabulary, then application message convergence strategy global geometry distributed intelligence In model, finding and search for candidate video is carried out, draw final retrieval result.
The approximate of amalgamation of global R features of the present invention repeats video retrieval method, specifically implements according to following steps:
Step 1, to video extraction local SIFT feature in database, concrete grammar is:
First to reference video storehouse in all videos carry out key-frame extraction, then to extract each key frame carry out SIFT feature is extracted;
Wherein, key-frame extraction is, using the method for uniform sampling, every 6 seconds a two field picture to be extracted;SIFT feature is extracted It is using (D.G.Lowe, " Distinctive image features from scale-invariant keypoints, " International Journal of Computer Vision, vol.60, no.2, pp.91-110, Nov, side 2004.) (translator of Chinese is method:Using the method in document " unique Scale invariant characteristics of image ");
Wherein, to key-frame extraction SIFT feature, and the information extracted is included:The position of characteristic point, yardstick, angle with And partial descriptions information.
Step 2, Jing after step 1, according to obtain local SIFT feature in coordinate information set up overall situation R features, foundation Method is specifically implemented in accordance with the following methods:
It is according to the SIFT for extracting to set up overall situation R features according to the coordinate information in the local SIFT feature that step 1 is obtained Positional information in feature, is become using improved Radon and brings extraction overall situation R features;
Radon conversion refers to that the straight line in a plane along different directions does line integral to function f, and the projection for obtaining is exactly The Radon conversion of function f;So the pixel of each non-zero on a discrete bianry image can be projected to one In Radon matrixes;
For piece image f (x, y), x, y are the coordinate of pixel in image, then image f (x, y) Jing Radon map tables It is shown as following form:
In formula (1):δ () is that Dirac delta function is also called unit impulse function, is equal in the point in addition to zero Zero, and its integration in whole domain of definition is equal to 1;θ is angle, and θ ∈ [0, π);ρ is polar diameter, and ρ ∈ (- ∞, ∞);
Improved Radon is converted and is repeated to be also called R changes in video retrieval method in the approximate of amalgamation of global R features of the present invention Change, be formulated as following form:
In formula (2),It is the Radon conversion of f (x, y);
Improved Radon conversion solves the problems, such as that former conversion does not possess yardstick, rotation and translation invariance;
In order to improve the robustness of feature, and the dimension of feature is reduced, applied (2D)2PCA principal component analysis algorithms pair The matrix obtained from R conversion carries out principal component analysis conversion, obtains corresponding low-dimensional matrix as final feature, and referred to as R is special Levy;(2D)2PCA uses document (Z.D., and Z.Z., " Letters:(2D)2PCA:Two-directional two- Dimensional PCA for efficient face representation and recognition, " Neurocomputing, vol.69, no.1, pp.224-231,2005.) in method (translator of Chinese is:Use document Bidirectional two-dimensional principal component analysis side in " application of the bidirectional two-dimensional principal component analysis in efficient face representation with identification " Method), while carrying out principal component analysis and calculating in row and column both direction, can so obtain the spy of higher accuracy of identification Levy.
Step 3, treat after the completion of step 2, set up BOF characteristic models using information is accorded with described in the SIFT feature of local, specifically Implement according to following steps:
Step 3.1, the descriptor in the SIFT feature in image library is instructed with large-scale data hierarchical clustering algorithm Practice, generate class;
Wherein, large-scale data hierarchical clustering algorithm is a kind of clustering algorithm, refer to document (K.Liao, G.Liu, L.Xiao, and C.Liu, " A sample-based hierarchical adaptive K-means clustering Method for large-scale video retrieval, " Knowledge-Based Systems, 2013.).
Step 3.2, Jing after step 3.1, is quantified, and generates the BOF features of each image, and concrete grammar is as follows:
The BOF features for quantifying to generate each image refer to each characteristic point for judging image with which class center recently, most Near is then put into such center, will finally generate a row frequency table, i.e., preliminary haves no right BOF;Followed by tf-idf to frequency Number table adds weight, generates final weighting BOF features;
Wherein, quantization method is carried out to the feature of inquiry video as follows:
The q in formula (3):Represent and quantify, RdThe d dimension datas in real number space are represented, k represents the quantity at class center, xi,j,i =1 ..., m2For ith feature in jth frame in reference video storehouse;
The tf-idf weights methods calculated per frame are specific as follows:
Wi=tfi·idfi(6);
In formula (4)~formula (6):K represents the quantity at class center;fijIt is the visual vocabulary belonging to ith feature at j-th The frequency occurred in frame of video;niIt is the sum of the reference video frame comprising the visual vocabulary belonging to ith feature;N is total Reference video number;tfiRepresent word frequency factor;idfiRepresent inverse word frequency factor;
Step 3.3, the BOF features to generating set up inverted index, and concrete grammar is:
Inverted index is typically made up of quantization list file and inverted list file two parts;
Quantify list file and have recorded all vocabulary occurred in document sets (image, frame of video);
Inverted list file is that the information such as the position by each vocabulary in log file (image, frame of video) and frequency are remembered Record is got off, and these information of all vocabulary just constitute inverted list;For n vocabulary (feature) w quantified in list file1…wn In a wi, in m log file (image, frame of video) d1…dmIn inverted list can be expressed as form:
The such record of n bars can constitute a complete inverted list;
In formula (7), fiRepresent frequency, direction and dimensional information;
Formula (7) gives a complete inverted index structure for query text vocabulary.
Step 4, the BOF characteristic models obtained according to step 3, set up based on BOF ballot retrieval model, specifically according to Lower step is implemented:
Given inquiry frame, is represented with local feature y, and all of key frame local feature in video database xj, j=1 ..., n is represented, based on specific as follows the step of BOF ballot retrievals:
Step 4.1, for inquiry frame local feature yl, l=1 ..., m1With all of key frame in video database Local feature xi,j, i=1 ..., m2, the similarity scores s between j=1 ..., n two frame of video of calculatingj, its algorithm is concrete It is as follows:
In formula (8):F is an adaptation function, and it reflects two features xi,jAnd ylBetween degree of similarity;
Step 4.2, Jing after step 4.1, in order to improve operation efficiency, typically feature is quantified according to visual vocabulary, And the characteristic storage of video in database after quantization in an inverted file, this quantizing process q uses formula (3);
Q (x after quantizationi,j) result be and feature xi,jThe sequence number at nearest class center (visual vocabulary);Therefore, if two Feature xi,jAnd ylQ (x are met after quantizationi,j)=q (yl), then the two features in the feature space of higher-dimension very close to probability It is very high;Around this principle, it is contemplated that aforesaid tf-idf methods of weighting, adaptation function f is then defined as algorithm:
Then two different features can efficiently be compared according to the result after quantization;
Step 4.3, Jing after step 4.2, be ultimately used to sort image similarity score sfIt is to sjAfter being post-processed Arrive, specifically implement by following algorithm:
Can be seen that by the formula (10) in the formula (9) and step 4.3 in step 4.2:Consider simultaneously inquiry frame of video and The tf-idf weights of the vision word of key frame in database, and they be added to based on BOF ballot search method in, this Plant method of weighting has carried out normalization to vision word histogram.
Step 5, application message convergence strategy global geometry distributed intelligence be fused to that Jing steps 4 set up based on BOF's In ballot retrieval model, accurate retrieval is approximate in large-scale data repeats video, and concrete grammar is as follows:
Application message convergence strategy is fused to the ballot inspection based on BOF that Jing steps 4 are set up global geometry distributed intelligence In rope model, the approximate video that repeats of accurate retrieval is specifically referred in retrieval complete in step 2 in large-scale data Realize that the approximate of large-scale data repeats video frequency searching in the BOF retrieval models that office's R features are embedded in step 3, its is concrete Method is as follows:
If two features x and y quantify Euclidean distance d for reflecting the two feature descriptors in the heart in same class (x, y) is very little, then the distance between network in the Euclidean space described by R features should also be very little;Based on this A bit, by q (x) and b (x), q is a quantizer to a descriptor, and b is R features;Then R features are embedded into BOF retrieval models In, the function of the adaptation function f for redefining, specific algorithm is as follows:
In formula (11):D represents Euclidean distance;htRepresent a thresholding;
General when quantifying, calculation can take smaller value in class, allow close video to match as far as possible, and ht To get the small value accordingly, here ht=0.005, in order to the video of error hiding can be removed according to the distance of R features.
The approximate of amalgamation of global R features of the present invention repeats video retrieval method, performs from function:Logarithm is first carried out According to video extraction local SIFT feature in storehouse;Secondly the coordinate information performed in the SIFT feature of local sets up overall situation R features; Then perform and set up BOF retrieval models using symbol information described in the SIFT feature of local;Next according to BOF models, set up and throw Ticket retrieval model;Finally perform application message convergence strategy to be fused in BOF models, extensive global geometry distributed intelligence Data in accurate retrieval is approximate repeats video.The approximate of amalgamation of global R features of the present invention repeats video retrieval method, fully Local grain information and global geometry distributed intelligence are make use of, and proposes a kind of method of Fusion Features, can be according to information Convergence strategy is blended in global geometry distributed intelligence in BOF models, realizes that accurate retrieval is approximate in large-scale data Repeat video.

Claims (7)

1. the approximate of amalgamation of global R features repeats video retrieval method, it is characterised in that specifically implement according to following steps:
Step 1, to video extraction local SIFT feature in database;
Step 2, the coordinate information foundation overall situation R features Jing after step 1, in the local SIFT feature for obtaining;
Step 3, treat after the completion of step 2, using information is accorded with described in the SIFT feature of local BOF characteristic models are set up;
Step 4, the BOF characteristic models obtained according to step 3, set up based on the ballot retrieval model of BOF;
Step 5, application message convergence strategy are fused to the ballot based on BOF that Jing steps 4 are set up global geometry distributed intelligence In retrieval model, accurate retrieval is approximate in large-scale data repeats video.
2. the approximate of amalgamation of global R features according to claim 1 repeats video retrieval method, it is characterised in that described Step 1 is specifically implemented in accordance with the following methods:
First to reference video storehouse in all videos carry out key-frame extraction, then SIFT feature is carried out to each key frame and is carried Take.
3. the approximate of amalgamation of global R features according to claim 2 repeats video retrieval method, it is characterised in that to institute The method that key-frame extraction uses uniform sampling is stated, and a two field picture was extracted every 6 seconds;
It is that it is carried to key-frame extraction SIFT feature using the method in unique Scale invariant characteristics of image that SIFT feature is extracted The information for taking is included:The position of characteristic point, yardstick, angle and partial descriptions information.
4. the approximate of amalgamation of global R features according to claim 1 repeats video retrieval method, it is characterised in that described Step 2 is specifically implemented in accordance with the following methods:
It is according to the SIFT feature extracted to set up overall situation R features according to the coordinate information in the local SIFT feature that step 1 is obtained In positional information, using improved Radon become bring extraction overall situation R features;
Radon conversion refers to that the straight line in a plane along different directions does line integral to function f, and the projection for obtaining is exactly function The Radon conversion of f;So the pixel of each non-zero on a discrete bianry image can be projected to a Radon square In battle array;
For piece image f (x, y), x, y are the coordinate of pixel in image, then image f (x, y) Jing Radon conversion is expressed as Following form:
T R f ( ρ , θ ) = ∫ - ∞ ∞ ∫ - ∞ ∞ f ( x , y ) δ ( x c o s θ + y sin θ - ρ ) d x d y - - - ( 1 ) ;
In formula (1):δ () is that Dirac delta function is also called unit impulse function, and in the point in addition to zero zero is equal to, And its integration in whole domain of definition is equal to 1;θ is angle, and θ ∈ [0, π);ρ is polar diameter, and ρ ∈ (- ∞, ∞);
Improved Radon is converted and is repeated to be also called R conversion in video retrieval method in the approximate of amalgamation of global R features of the present invention, It is formulated as following form:
R f ( θ ) = ∫ - ∞ ∞ T R f 2 ( ρ , θ ) d ρ - - - ( 2 ) ;
In formula (2),It is the Radon conversion of f (x, y);
Improved Radon conversion solves the problems, such as that former conversion does not possess yardstick, rotation and translation invariance;
Using (2D)2PCA principal component analysis algorithm carries out principal component analysis conversion and obtains corresponding to the matrix obtained from R conversion Low-dimensional matrix is used as final feature, referred to as R features;(2D)2PCA uses bidirectional two-dimensional principal component analysis in efficient people Face represents and the bidirectional two-dimensional principal component analytical method in the application in identification, at the same carry out in row and column both direction it is main into Part analysis and calculating, in order to the feature of higher accuracy of identification can be obtained.
5. the approximate of amalgamation of global R features according to claim 1 repeats video retrieval method, it is characterised in that described Step 3 is specifically implemented according to following steps:
Step 3.1, the descriptor in the SIFT feature in image library is trained with large-scale data hierarchical clustering algorithm, it is raw Into class;
The large-scale data hierarchical clustering algorithm is a kind of clustering algorithm;
Step 3.2, Jing after step 3.1, is quantified, and generates the BOF features of each image, and concrete grammar is as follows:
The BOF features for quantifying to generate each image refer to each characteristic point for judging image with which class center recently, nearest Such center is then put into, a row frequency table will be finally generated, i.e., preliminary haves no right BOF;Followed by tf-idf to frequency table Plus weight, final weighting BOF features are generated;
Wherein, quantization method is carried out to the feature of inquiry video as follows:
q:Rd→[1,k]
The q in formula (3):Represent and quantify, RdThe d dimension datas in real number space are represented, k represents the quantity at class center, xi,j, i= 1,...,m2For ith feature in jth frame in reference video storehouse;
The tf-idf weights methods calculated per frame are specific as follows:
tf i = f i j / Σ t = 1 k f t j - - - ( 4 ) ;
idf i = l o g N n i - - - ( 5 ) ;
Wi=tfi·idfi(6);
In formula (4)~formula (6):K represents the quantity at class center;fijIt is the visual vocabulary belonging to ith feature in j-th video The frequency occurred on frame;niIt is the sum of the reference video frame comprising the visual vocabulary belonging to ith feature;N is total reference Video counts;tfiRepresent word frequency factor;idfiRepresent inverse word frequency factor;
Step 3.3, the BOF features to generating set up inverted index, and concrete grammar is:
Inverted index is typically made up of quantization list file and inverted list file two parts;
Quantify list file and have recorded all vocabulary occurred in document sets;
Inverted list file is that the information such as the position by each vocabulary in log file and frequency are recorded, all vocabulary These information just constitute inverted list;For the n vocabulary w quantified in list file1…wnIn a wi, in m log file d1…dmIn inverted list can be expressed as form:
w i d 1 [ f 1 ] < p i 1 , . . . , p if 1 > . . . d m [ f m ] < p i 1 , . . . , p if m > - - - ( 7 ) ;
The above-mentioned record of n bars can constitute a complete inverted list;
In formula (7), fiRepresent frequency, direction and dimensional information;
Formula (7) gives a complete inverted index structure for query text vocabulary.
6. the approximate of amalgamation of global R features according to claim 1 repeats video retrieval method, it is characterised in that described Step 4 is specifically implemented according to following steps:
Given inquiry frame, is represented with local feature y, and all of key frame local feature x in video databasej,j =1 ..., n represents, based on specific as follows the step of BOF ballot retrievals:
Step 4.1, for inquiry frame local feature yl, l=1 ..., m1With the local of all of key frame in video database Feature xi,j, i=1 ..., m2, the similarity scores s between j=1 ..., n two frame of video of calculatingj, its algorithm is specific as follows:
s j = &Sigma; l = 1 m 1 &Sigma; i = 1 m 2 f ( x i , j , y l ) - - - ( 8 ) ;
In formula (8):F is an adaptation function, and it reflects two features xi,jAnd ylBetween degree of similarity;
Step 4.2, Jing after step 4.1, feature is quantified according to visual vocabulary, and the spy of video in database after quantization Levy and be stored in an inverted file, this quantizing process q uses formula (3);
Q (x after quantizationi,j) result be and feature xi,jThe sequence number at nearest class center (visual vocabulary);Therefore, if two features xi,jAnd ylQ (x are met after quantizationi,j)=q (yl), then the two features in the feature space of higher-dimension very close to probability very It is high;Around this principle, it is contemplated that aforesaid tf-idf methods of weighting, adaptation function f is then defined as algorithm:
f tf - idf ( x i , j , y l ) = ( w q ( y l ) &CenterDot; w q ( x i , j ) ) &delta; q ( x i , j ) , q ( y l ) - - - ( 9 ) ;
Then two different features can efficiently be compared according to the result after quantization;
Step 4.3, Jing after step 4.2, be ultimately used to sort image similarity score sfIt is to sjObtain after being post-processed , specifically implement by following algorithm:
s f = s j / &Sigma; l = 1 m 1 w q ( y l ) 2 &Sigma; i = 1 m 2 w q ( x i , j ) 2 - - - ( 10 ) ;
Can be seen that by the formula (10) in the formula (9) and step 4.3 in step 4.2:Inquiry frame of video and data are considered simultaneously The tf-idf weights of the vision word of key frame in storehouse, and during both have been added to based on BOF ballot search methods, it is this to add Power method has carried out normalization to vision word histogram.
7. the approximate of amalgamation of global R features according to claim 1 repeats video retrieval method, it is characterised in that described Step 5 is specifically implemented in accordance with the following methods:
If two features x and y quantify Euclidean distance d (x, y) for reflecting the two feature descriptors in the heart in same class It is very little, then the distance between network in the Euclidean space described by R features is also very little;Based on this point, one By q (x) and b (x), q is a quantizer to descriptor, and b is R features;Then R features are embedded in BOF retrieval models, it is again fixed The function of the adaptation function f of justice, specific algorithm is as follows:
f R E ( x , y ) = ( t f - i d f ( q ( x ) ) ) 2 i f q ( x ) = q ( y ) a n d d ( b ( x ) , b ( y ) ) &le; h t 0 o t h e r w i s e - - - ( 11 ) ;
In formula (11):D represents Euclidean distance;htRepresent a thresholding;
When quantifying, calculation will take smaller value in class, allow close video to match as far as possible, and htAlso to take accordingly little It is worth, here ht=0.005, in order to the video of error hiding can be removed according to the distance of R features.
CN201610820574.2A 2016-09-13 2016-09-13 The approximate of amalgamation of global R feature repeats video retrieval method Active CN106649440B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610820574.2A CN106649440B (en) 2016-09-13 2016-09-13 The approximate of amalgamation of global R feature repeats video retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610820574.2A CN106649440B (en) 2016-09-13 2016-09-13 The approximate of amalgamation of global R feature repeats video retrieval method

Publications (2)

Publication Number Publication Date
CN106649440A true CN106649440A (en) 2017-05-10
CN106649440B CN106649440B (en) 2019-10-25

Family

ID=58852760

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610820574.2A Active CN106649440B (en) 2016-09-13 2016-09-13 The approximate of amalgamation of global R feature repeats video retrieval method

Country Status (1)

Country Link
CN (1) CN106649440B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391594A (en) * 2017-06-29 2017-11-24 安徽睿极智能科技有限公司 A kind of image search method based on the sequence of iteration vision
CN108566562A (en) * 2018-05-02 2018-09-21 中广热点云科技有限公司 Copyright video information structuring arranges the method for completing sample approved sample
CN108647307A (en) * 2018-05-09 2018-10-12 京东方科技集团股份有限公司 Image processing method, device, electronic equipment and storage medium
CN110895689A (en) * 2018-09-12 2020-03-20 苹果公司 Mixed mode lighting for face recognition authentication
CN113239159A (en) * 2021-04-26 2021-08-10 成都考拉悠然科技有限公司 Cross-modal retrieval method of videos and texts based on relational inference network
CN114298992A (en) * 2021-12-21 2022-04-08 北京百度网讯科技有限公司 Video frame duplication removing method and device, electronic equipment and storage medium
US11874869B2 (en) 2018-03-29 2024-01-16 Beijing Bytedance Network Technology Co., Ltd. Media retrieval method and apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833650A (en) * 2009-03-13 2010-09-15 清华大学 Video copy detection method based on contents
CN102693299A (en) * 2012-05-17 2012-09-26 西安交通大学 System and method for parallel video copy detection
CN103631932A (en) * 2013-12-06 2014-03-12 中国科学院自动化研究所 Method for detecting repeated video

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833650A (en) * 2009-03-13 2010-09-15 清华大学 Video copy detection method based on contents
CN102693299A (en) * 2012-05-17 2012-09-26 西安交通大学 System and method for parallel video copy detection
CN103631932A (en) * 2013-12-06 2014-03-12 中国科学院自动化研究所 Method for detecting repeated video

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KAIYANG LIAO等: "IR Feature Embedded BOF Indexing Method for Near-Duplicate Video Retrieval", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》 *
XIANGMIN ZHOU等: "Structure Tensor Series-Based Large Scale Near-Duplicate Video Retrieval", 《IEEE TRANSACTIONS ON MULTIMEDIA》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391594A (en) * 2017-06-29 2017-11-24 安徽睿极智能科技有限公司 A kind of image search method based on the sequence of iteration vision
CN107391594B (en) * 2017-06-29 2020-07-10 安徽睿极智能科技有限公司 Image retrieval method based on iterative visual sorting
US11874869B2 (en) 2018-03-29 2024-01-16 Beijing Bytedance Network Technology Co., Ltd. Media retrieval method and apparatus
CN108566562A (en) * 2018-05-02 2018-09-21 中广热点云科技有限公司 Copyright video information structuring arranges the method for completing sample approved sample
CN108566562B (en) * 2018-05-02 2020-09-08 中广热点云科技有限公司 Method for finishing sample sealing by copyright video information structured arrangement
CN108647307A (en) * 2018-05-09 2018-10-12 京东方科技集团股份有限公司 Image processing method, device, electronic equipment and storage medium
CN110895689A (en) * 2018-09-12 2020-03-20 苹果公司 Mixed mode lighting for face recognition authentication
CN110895689B (en) * 2018-09-12 2023-07-11 苹果公司 Mixed mode illumination for facial recognition authentication
CN113239159A (en) * 2021-04-26 2021-08-10 成都考拉悠然科技有限公司 Cross-modal retrieval method of videos and texts based on relational inference network
CN113239159B (en) * 2021-04-26 2023-06-20 成都考拉悠然科技有限公司 Cross-modal retrieval method for video and text based on relational inference network
CN114298992A (en) * 2021-12-21 2022-04-08 北京百度网讯科技有限公司 Video frame duplication removing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN106649440B (en) 2019-10-25

Similar Documents

Publication Publication Date Title
Zhang et al. Self-training with progressive augmentation for unsupervised cross-domain person re-identification
Yu et al. Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition
Cheng et al. A deep semantic alignment network for the cross-modal image-text retrieval in remote sensing
CN106649440B (en) The approximate of amalgamation of global R feature repeats video retrieval method
CN110059198B (en) Discrete hash retrieval method of cross-modal data based on similarity maintenance
WO2022068196A1 (en) Cross-modal data processing method and device, storage medium, and electronic device
Lai et al. Instance-aware hashing for multi-label image retrieval
Deng et al. Discriminative dictionary learning with common label alignment for cross-modal retrieval
Zheng et al. A deep and autoregressive approach for topic modeling of multimodal data
CN106951551B (en) Multi-index image retrieval method combining GIST characteristics
Zheng et al. $\mathcal {L} _p $-Norm IDF for Scalable Image Retrieval
CN111581405A (en) Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning
Leng et al. 3D object retrieval with multitopic model combining relevance feedback and LDA model
Wu et al. Learning of multimodal representations with random walks on the click graph
Song et al. Deep memory network for cross-modal retrieval
Zhang et al. Social image tagging using graph-based reinforcement on multi-type interrelated objects
Li et al. Exploiting hierarchical activations of neural network for image retrieval
Abdul-Rashid et al. Shrec’18 track: 2d image-based 3d scene retrieval
Nie et al. Convolutional deep learning for 3D object retrieval
Xu et al. Iterative manifold embedding layer learned by incomplete data for large-scale image retrieval
Han et al. VRFP: On-the-fly video retrieval using web images and fast fisher vector products
CN105843925A (en) Similar image searching method based on improvement of BOW algorithm
Wang et al. Beauty product image retrieval based on multi-feature fusion and feature aggregation
Zhang et al. Video copy detection based on deep CNN features and graph-based sequence matching
Zhang et al. Dataset-driven unsupervised object discovery for region-based instance image retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant