The content of the invention
It is an object of the invention to provide a kind of video retrieval method merged based on multiple image and system, regarded with improving
The recall ratio of frequency retrieval.
To realize object above, the technical solution adopted by the present invention is:In a first aspect, the present invention provides a kind of based on several
The video retrieval method of image co-registration, this method includes:
Decoding is carried out to database video with video lens to split, and obtains multiple video lens;
Single video lens are carried out with key-frame extraction, and local shape factor is carried out to key frame;
Part local feature is clustered, using obtained cluster centre set as database video local feature code
This;
According to the code book of database video local feature, quantization encoding is carried out to all local features of database video;
After quantization encoding, the local feature set to all key frames of single video lens carries out pond processing, obtains
Local feature pondization after single video lens quantify is gathered;
Local feature pondization set after being quantified according to the code book of database video local feature and single video lens, builds
Vertical inverted file index;
Indexed according to several query images and inverted file of target video to be retrieved, carry out the online inspection of target video
Rope.
Second aspect, the invention provides a kind of video frequency search system merged based on multiple image, the system includes:Depending on
Frequency processing module, distributed storage module and retrieval module;
Video processing module include processing unit, the first extraction unit, the first cluster cell, the first quantization encoding unit with
And the first pond unit;
Processing unit is connected with database, and decoding is carried out to the video in database and is split with video lens, obtains multiple
Video lens;
First extraction unit is connected with processing unit to carry out key-frame extraction to single video lens, and key frame is entered
Row local shape factor;
First cluster cell is connected with extraction unit to be clustered to part local feature, by obtained cluster centre collection
Cooperate as the code book of database video local feature;
First quantization encoding unit is connected with cluster cell with according to the code book of database video local feature, to database
All local features of video carry out quantization encoding;
First pond unit is connected with quantization encoding unit with after quantization encoding, to all key frames of single video lens
Local feature set carry out pond processing, obtain the local feature pondization after single video lens quantify and gather;
Distributed storage module is connected with video processing module with the code book according to database video local feature and single
Local feature pondization after video lens quantify is gathered, and sets up inverted file index;
Retrieval module is connected with distributed storage module with several query images according to target video to be retrieved and reverse
File index, carries out the online retrieving of target video.
Compared with prior art, there is following technique effect in the present invention:First, by the present invention in that using same target video
Several query images, to carry out searching rope to target video, different visual angles can be taken into account, to the description of searched targets video more
Plus it is accurate, improve the recall ratio to target video.Second, by setting up inverted file index part offline, with database
The video lens of video are unit, carry out pond to the local feature of all key frames of single video lens, obtain single video
Local feature pondization after camera lens quantifies is gathered, and greatly reduces internal memory consuming and the record count in database, not only adds
Fast retrieval rate and memory consumption is saved to tens even number one thousandths of original technology.
Embodiment
With reference to Fig. 2 to Fig. 6, the present invention is described in further detail.
As shown in Fig. 2 present embodiments providing a kind of video retrieval method merged based on multiple image, this method includes
Following steps S1 to S7:
S1, to database video carry out decoding with video lens split, obtain multiple video lens;
Specifically, multiple video lens at this refer to be divided at least one video lens.
S2, to single video lens carry out key-frame extraction, and to key frame carry out local shape factor;
Specifically, single video lens are extracted with an at least width key frame, and feature extraction is carried out to key frame, here
Feature extraction includes but is not limited to local shape factor and global characteristics are extracted, and will carry out local spy in the present embodiment to key frame
Extraction is levied as more preferred scheme.
S3, part local feature is clustered, regard obtained cluster centre set as database video local feature
Code book;
All local features of database video are carried out quantization volume by S4, the code book according to database video local feature
Code;
S5, after quantization encoding, pond processing is carried out to the local feature set of all key frames of single video lens, obtained
Local feature pondization set after to the quantization of single video lens;
It should be noted that pond (pooling) mode in the present embodiment is included but are not limited to:Average pond
(average pooling), maximum pond (max pooling) etc..
It should be noted that the local feature pond set after quantization at this is to all key frames of single video lens
Local feature carry out pond result, it is different from the concept of key frame local feature.
Local feature pond Hua Ji after S6, the code book according to database video local feature and the quantization of single video lens
Close, set up inverted file index;
It should be noted that because in retrieval, the number of code book corresponds to the dimension of statistic histogram, the number of code book
It is than larger, such as tens of thousands of to up to a million.In this way, during local feature pondization after quantization is gathered, most of code word is assigned to
Value be all zero, this causes the local feature pondization set after quantifying is distributed very sparse, using this openness, it is possible to
Indexed using the sequence of falling in text retrieval to set up inverted file.
S7, according to several query images and inverted file of target video to be retrieved index, carry out target video it is online
Retrieval.
Wherein, several query images in the present embodiment refer at least two width query images.
Specifically, as shown in figure 3, step S7 comprises the following steps S71 to S75:
S71, all query images progress local shape factor for treating searched targets video;
Whole local features of all query images are carried out by S72, the code book according to the database video local feature
Quantization encoding;
S73, whole local features after all query image quantization encodings are done to pondization processing, obtain all query images
Local feature pondization set after quantization;
S74, index according to described inverted file, by the local feature pondization set after the quantization of target video to be retrieved
Local feature pondization set after quantifying with single video lens in database video carries out similarity-rough set;
S75, according to obtained similarity is compared, the video file inquired is ranked up, complete target video
Line retrieval.
In the present embodiment, when being inquired about using multiple image, the local feature of all query images is carried out
The local feature of all query images, can be converted into the office after a quantization that can accurately describe target video by Chi Hua
Portion's feature pool set, is used as the new feature of all query images so that the search efficiency of target video was searched for existing
The search efficiency of journey is held essentially constant.
Specifically, S3:" local feature of part is clustered, obtained cluster centre set is regarded as database
The code book of frequency local feature ", specifically includes following fine division step:
In the whole local features extracted from all videos camera lens key frame, part is spaced or randomly selected local special
Levy;
Based on default unsupervised distance method, the part local feature of the extraction is clustered, by obtained k
Characteristic features are used as code book;
It should be noted that in the present embodiment default unsupervised distance method to include but is not limited to k-means unsupervised
Distance method.
Correspondingly, S4:" according to the code book of database video local feature, all local features of database video are entered
Row quantization encoding ", is specifically included:
According to k feature code book, local spy is carried out to whole local features of video lens in units of single key frame
Vector quantization is levied, the local feature statistic histogram of each key frame is obtained.
Specifically, S6:" according to local special after the quantization of the code book of database video local feature and single video lens
Pondization set is levied, inverted file index is set up ", specifically include following fine division step:
Successively using each code word ID in the code book of database video local feature as gauge outfit, chained list is set up;
Video in database is scanned, all video lens ID and relevant information comprising the code word are pressed into chain
In table, inverted file index is obtained.
It should be noted that the relevant information in the present embodiment includes but is not limited to word frequency, Hamming code and characteristic distance
Etc. information.
Specifically, step S6 " is indexed, part that will be after the quantization of target video to be retrieved is special according to described inverted file
Levy pondization set and carry out similarity ratio with the local feature pondization set after the quantization of the single video lens in database video
Compared with " detailed process be:Some code word in local feature pondization set after being quantified according to all query images, scans reverse rope
The corresponding chained list of the code word in quotation part, obtains the query image in the code word similar to video of the database comprising the code word
Degree.
Specifically, method disclosed in the present embodiment is in step S72:" according to the code book of database video local feature, to institute
All local features for having query image carry out quantization encoding " after, also comprise the following steps:
Whole local features of all query images after quantization encoding are intersected and compared, the spy of all query images is determined
It is target to be searched region to levy matching overlapping region;
Correspondingly, step S73:" whole local features after all query image quantization encodings are done into pondization processing, obtained
Local feature pondization after all query images quantify is gathered ", specifically include:
Pond is carried out to the local feature for falling all query images in target to be searched region, target to be retrieved is obtained
Local feature pondization set after the quantization of video.
It should be noted that excavate common character subset automatically by the correlation according to feature between image, and with
The set determines the locus of target video to be retrieved in the picture, and whole process just may be used independent of any artificial mark
The region of target video to be retrieved is obtained, obtained Query Result ratio is inquired about with target area progress and inquired about with whole pictures
Obtained Query Result is more accurate.
Specifically, using the process schematic of the video retrieval method merged based on multiple image in the present embodiment as schemed
Shown in 4.
As shown in Figure 5, Figure 6, present embodiment discloses a kind of video frequency search system merged based on multiple image, including:
Video processing module 10, distributed storage module 20 and retrieval module 30;
Video processing module 10 includes processing unit 11, the first extraction unit 12, the first cluster cell 13, first and quantifies to compile
The code pond unit 15 of unit 14 and first;
Processing unit 11 is connected with database, and decoding is carried out to the video in database and is split with video lens, obtains many
Individual video lens;
First extraction unit 12 is connected with processing unit 11 to carry out key-frame extraction to single video lens, and to key
Frame carries out local shape factor;
First cluster cell 13 is connected with extraction unit 12 to be clustered to part local feature, by obtained cluster
Heart set as database video local feature code book;
First quantization encoding unit 14 is connected with according to the code book of database video local feature, logarithm with cluster cell 13
Quantization encoding is carried out according to all local features of storehouse video;
First pond unit 15 is connected with quantization encoding unit 14 so that after quantization encoding, to single video lens, institute is relevant
The local feature set of key frame carries out pond processing, obtains the local feature pondization set after single video lens quantify;
Distributed storage module 20 be connected with video processing module 10 with the code book according to database video local feature and
Local feature pondization after single video lens quantify is gathered, and sets up inverted file index;
Retrieval module 30 be connected with distributed storage module 20 with several query images according to target video to be retrieved and
Inverted file is indexed, and carries out the online retrieving of target video.
It should be noted that the video processing module 10 in the present embodiment is specially video processing service device group, it is distributed
Memory module 20 is specially disk array, and retrieval module 30 is specially retrieval server group.Specific hardware configuration parameter referring to
Table 1:
Table 1
It should be noted that the distributed storage module 20 at this supports dynamic insertion/deletion of video feature vector, with
And support quick stochastic searching.
Specifically, retrieval module 30 is specifically included:Second extraction unit 31, the second quantization encoding unit 32, the second pond
Unit 33, comparing unit 34 and retrieval unit 35;
All query images that second extraction unit 31 treats searched targets video carry out local shape factor;
Second quantization encoding unit 32 is connected with according to the database video local feature with the second extraction unit 31
All local features of all query images are carried out quantization encoding by code book;
Second pond unit 33 is connected with the second quantization encoding unit 32 with will be complete after all query image quantization encodings
Portion's local feature does pondization processing, obtains the set of the local feature pondization after the quantization of target video to be retrieved;
Comparing unit 34 is connected with according to described inverted file rope with the second pond unit 33, distributed storage module 20
Draw, in the set of local feature pondization and database video after target video to be retrieved is quantified after the quantization of single video lens
Local feature pondization set carry out similarity-rough set;
The similarity that retrieval unit 35 is connected to obtain according to comparing with comparing unit 34, enters to the video file inquired
Row sequence, completes the online retrieving of target video.
Specifically, the first cluster cell 13 specifically for:
In the whole local features extracted from all videos camera lens key frame, part is spaced or randomly selected local special
Levy;
Based on default unsupervised distance method, the part local feature of the extraction is clustered, by obtained k
Characteristic features are used as code book;
Correspondingly, the first described quantization encoding unit 14, specifically for:
According to k feature code book, local spy is carried out to whole local features of video lens in units of single key frame
Vector quantization is levied, the local feature statistic histogram of each key frame is obtained.
Specifically, distributed storage module 20 is specifically included:Chained list sets up unit 21 and reverse indexing sets up unit 22;
Chained list sets up unit 21 successively using each code word ID in the code book of database video local feature as gauge outfit, sets up
Chained list;
Reverse indexing, which sets up unit 22 and sets up unit 21 with chained list, to be connected to be scanned to the video in database, by institute
Have in the video lens ID comprising the code word and relevant information press-in chained list, obtain inverted file index, wherein, described correlation
Information includes word frequency and Hamming code.
Specifically, video processing module 30 also includes matching unit 36;
Matching unit 36 is connected with the second quantization encoding unit 32 with by the whole of all query images after quantization encoding
Local feature, which intersects, to be compared, and the characteristic matching overlapping region for determining all query images is target to be searched region;
Correspondingly, the second described pond unit 33 is connected with matching unit 36, specifically for:
Pond is carried out to the local feature for falling all query images in target to be searched region, target to be retrieved is obtained
Local feature pondization set after the quantization of video.
It should be noted that based on multiple image merge video frequency search system specific work process and main points with it is above-mentioned
The video retrieval method merged based on multiple image is identical, and here is omitted.
It should be noted that the video retrieval method disclosed by the invention merged based on multiple image and system are with as follows
Technique effect:
(1) using several inquiry target images, when expressing destination object, different visual angles can be taken into account, make description more
Precisely, this is very helpful to the recall ratio for improving searching system., can be as list simultaneously by feature pool during many figure inquiries
Width image querying is the same, still only describes target to be found with a characteristic vector so that search efficiency is held essentially constant.
(2) processed offline of database video section, by feature pool, using video lens rather than key frame to be single
Position, retains the quantization characteristic vector behind pond, greatly reduces internal memory consuming and the record count in database, greatly improve
Recall precision, saves memory consumption to the tens of former technology to several one thousandths, at the same keep quite, the search of even more high
Precision.
(3) in several query image importations, excavated altogether come automatic by the correlation of feature between all query images
Same character subset, determines the locus region of target to be searched in the picture, independent of any artificial mark with the set
Note, so that it may obtain the region of target to be searched, as inquiry, is obtained than whole pictures more accurately Query Result.