CN107153670A

CN107153670A - The video retrieval method and system merged based on multiple image

Info

Publication number: CN107153670A
Application number: CN201710059040.7A
Authority: CN
Inventors: 周晓; 朱才志; 张险峰; 魏京京
Original assignee: Hefei Unicorn Mdt Infotech Ltd
Current assignee: Hefei Lintu Information Technology Co ltd
Priority date: 2017-01-23
Filing date: 2017-01-23
Publication date: 2017-09-12
Anticipated expiration: 2037-01-23
Also published as: CN107153670B

Abstract

The present invention discloses a kind of video retrieval method merged based on multiple image and system, belong to video search technique area, this method carries out decoding and video lens segmentation to database video in offline index part of setting up, obtain multiple video lens, local feature set to single video lens all keys carries out pond processing, and inverted file index is set up in units of video lens.On-line search part, using all query images, and pond is carried out after quantization to the local feature of all query images, obtain the set of the local feature pondization after the quantization of all query images, indexed according to inverted file, similarity-rough set is carried out to the local feature pondization set after the quantization of query image and video lens, video frequency searching is carried out.And disclose a kind of video frequency search system merged based on multiple image.The present invention improves the recall ratio of video frequency searching on the premise of search efficiency is ensured.

Description

The video retrieval method and system merged based on multiple image

Technical field

The present invention relates to video search technique area, more particularly to a kind of video retrieval method merged based on multiple image And system.

Background technology

Automatic video frequency data retrieval belongs to content based video retrieval system problem, and its purpose is exactly by image/video Content carries out computer disposal, analysis and understood, sets up structure and index, to realize that image/video information easily and effectively is obtained Take.In recent years, researcher both domestic and external has carried out substantial amounts of research to video frequency search system, and the intelligence developed at present is regarded Frequency monitoring can be detected to target, tracked and be classified, and can monitor some unexpected abnormality events in real time.

At present, the workflow of general video frequency search system is as shown in figure 1, main include indexing and examining online offline Rope two parts.Index part offline：The first step, carries out key-frame extraction, by video counts to the video data in database According to being converted to view data；Second step, extracts key frame feature from key frame；3rd step, by all key frame features Carry out quantization encoding；4th step, according to the coding of key frame feature, sets up inverted file and is indexed for quick-searching.Online inspection Rope part：The first step, feature extraction is carried out to query image；Second step, the feature to all query images carries out quantization encoding； 3rd step, is indexed by inverted file, and the feature of query image and the feature of database key frame of video are carried out into similarity ratio Compared with；4th step, according to the similarity of the feature and video database key frame feature of query image, to the Video Key inquired Frame is ranked up；5th, by obtaining video file ranking results to the fusion of key frame ranking results.

But the defect of above-mentioned this video retrieval technology is：One is, recall ratio is not high, because being inquired about according to a width Image is generally inadequate to describe the target to be inquired about, especially for non-rigid or complicated space topological object, and And in actual applications, the omission of any clues and traces of monitoring objective is likely to cause the loss of monitoring objective, therefore looks into Full rate is often more even more important than precision ratio in actual applications.Two are, search efficiency is low, existing video retrieval technology it is worked , it is necessary to be ranked up to each key frame in video in journey, then by the ranking results of key frame merge being regarded The ranking results of frequency, but be due to that the number of frame of video is far longer than the number of video, therefore key frame is ranked up and can led Cause the inquiry velocity of target slow, low to the utilization rate of resource.

The content of the invention

It is an object of the invention to provide a kind of video retrieval method merged based on multiple image and system, regarded with improving The recall ratio of frequency retrieval.

To realize object above, the technical solution adopted by the present invention is：In a first aspect, the present invention provides a kind of based on several The video retrieval method of image co-registration, this method includes：

Decoding is carried out to database video with video lens to split, and obtains multiple video lens；

Single video lens are carried out with key-frame extraction, and local shape factor is carried out to key frame；

Part local feature is clustered, using obtained cluster centre set as database video local feature code This；

According to the code book of database video local feature, quantization encoding is carried out to all local features of database video；

After quantization encoding, the local feature set to all key frames of single video lens carries out pond processing, obtains Local feature pondization after single video lens quantify is gathered；

Local feature pondization set after being quantified according to the code book of database video local feature and single video lens, builds Vertical inverted file index；

Indexed according to several query images and inverted file of target video to be retrieved, carry out the online inspection of target video Rope.

Second aspect, the invention provides a kind of video frequency search system merged based on multiple image, the system includes：Depending on Frequency processing module, distributed storage module and retrieval module；

Video processing module include processing unit, the first extraction unit, the first cluster cell, the first quantization encoding unit with And the first pond unit；

Processing unit is connected with database, and decoding is carried out to the video in database and is split with video lens, obtains multiple Video lens；

First extraction unit is connected with processing unit to carry out key-frame extraction to single video lens, and key frame is entered Row local shape factor；

First cluster cell is connected with extraction unit to be clustered to part local feature, by obtained cluster centre collection Cooperate as the code book of database video local feature；

First quantization encoding unit is connected with cluster cell with according to the code book of database video local feature, to database All local features of video carry out quantization encoding；

First pond unit is connected with quantization encoding unit with after quantization encoding, to all key frames of single video lens Local feature set carry out pond processing, obtain the local feature pondization after single video lens quantify and gather；

Distributed storage module is connected with video processing module with the code book according to database video local feature and single Local feature pondization after video lens quantify is gathered, and sets up inverted file index；

Retrieval module is connected with distributed storage module with several query images according to target video to be retrieved and reverse File index, carries out the online retrieving of target video.

Compared with prior art, there is following technique effect in the present invention：First, by the present invention in that using same target video Several query images, to carry out searching rope to target video, different visual angles can be taken into account, to the description of searched targets video more Plus it is accurate, improve the recall ratio to target video.Second, by setting up inverted file index part offline, with database The video lens of video are unit, carry out pond to the local feature of all key frames of single video lens, obtain single video Local feature pondization after camera lens quantifies is gathered, and greatly reduces internal memory consuming and the record count in database, not only adds Fast retrieval rate and memory consumption is saved to tens even number one thousandths of original technology.

Brief description of the drawings

Fig. 1 is the schematic flow sheet for the existing video frequency searching process that background of invention part is addressed；

Fig. 2 is a kind of flow signal of video retrieval method merged based on multiple image in one embodiment of the invention Figure；

Fig. 3 is the schematic flow sheet of the fine division step of step S7 in one embodiment of the invention；

Fig. 4 is the schematic flow sheet of the video frequency searching process in one embodiment of the invention；

Fig. 5 is a kind of structural representation of the video frequency search system merged based on multiple image in one embodiment of the invention；

Fig. 6 is that a kind of distributed frame of the video frequency search system merged based on multiple image is shown in one embodiment of the invention It is intended to.

Embodiment

With reference to Fig. 2 to Fig. 6, the present invention is described in further detail.

As shown in Fig. 2 present embodiments providing a kind of video retrieval method merged based on multiple image, this method includes Following steps S1 to S7：

S1, to database video carry out decoding with video lens split, obtain multiple video lens；

Specifically, multiple video lens at this refer to be divided at least one video lens.

S2, to single video lens carry out key-frame extraction, and to key frame carry out local shape factor；

Specifically, single video lens are extracted with an at least width key frame, and feature extraction is carried out to key frame, here Feature extraction includes but is not limited to local shape factor and global characteristics are extracted, and will carry out local spy in the present embodiment to key frame Extraction is levied as more preferred scheme.

S3, part local feature is clustered, regard obtained cluster centre set as database video local feature Code book；

All local features of database video are carried out quantization volume by S4, the code book according to database video local feature Code；

S5, after quantization encoding, pond processing is carried out to the local feature set of all key frames of single video lens, obtained Local feature pondization set after to the quantization of single video lens；

It should be noted that pond (pooling) mode in the present embodiment is included but are not limited to：Average pond (average pooling), maximum pond (max pooling) etc..

It should be noted that the local feature pond set after quantization at this is to all key frames of single video lens Local feature carry out pond result, it is different from the concept of key frame local feature.

Local feature pond Hua Ji after S6, the code book according to database video local feature and the quantization of single video lens Close, set up inverted file index；

It should be noted that because in retrieval, the number of code book corresponds to the dimension of statistic histogram, the number of code book It is than larger, such as tens of thousands of to up to a million.In this way, during local feature pondization after quantization is gathered, most of code word is assigned to Value be all zero, this causes the local feature pondization set after quantifying is distributed very sparse, using this openness, it is possible to Indexed using the sequence of falling in text retrieval to set up inverted file.

S7, according to several query images and inverted file of target video to be retrieved index, carry out target video it is online Retrieval.

Wherein, several query images in the present embodiment refer at least two width query images.

Specifically, as shown in figure 3, step S7 comprises the following steps S71 to S75：

S71, all query images progress local shape factor for treating searched targets video；

Whole local features of all query images are carried out by S72, the code book according to the database video local feature Quantization encoding；

S73, whole local features after all query image quantization encodings are done to pondization processing, obtain all query images Local feature pondization set after quantization；

S74, index according to described inverted file, by the local feature pondization set after the quantization of target video to be retrieved Local feature pondization set after quantifying with single video lens in database video carries out similarity-rough set；

S75, according to obtained similarity is compared, the video file inquired is ranked up, complete target video Line retrieval.

In the present embodiment, when being inquired about using multiple image, the local feature of all query images is carried out The local feature of all query images, can be converted into the office after a quantization that can accurately describe target video by Chi Hua Portion's feature pool set, is used as the new feature of all query images so that the search efficiency of target video was searched for existing The search efficiency of journey is held essentially constant.

Specifically, S3：" local feature of part is clustered, obtained cluster centre set is regarded as database The code book of frequency local feature ", specifically includes following fine division step：

In the whole local features extracted from all videos camera lens key frame, part is spaced or randomly selected local special Levy；

Based on default unsupervised distance method, the part local feature of the extraction is clustered, by obtained k Characteristic features are used as code book；

It should be noted that in the present embodiment default unsupervised distance method to include but is not limited to k-means unsupervised Distance method.

Correspondingly, S4：" according to the code book of database video local feature, all local features of database video are entered Row quantization encoding ", is specifically included：

According to k feature code book, local spy is carried out to whole local features of video lens in units of single key frame Vector quantization is levied, the local feature statistic histogram of each key frame is obtained.

Specifically, S6：" according to local special after the quantization of the code book of database video local feature and single video lens Pondization set is levied, inverted file index is set up ", specifically include following fine division step：

Successively using each code word ID in the code book of database video local feature as gauge outfit, chained list is set up；

Video in database is scanned, all video lens ID and relevant information comprising the code word are pressed into chain In table, inverted file index is obtained.

It should be noted that the relevant information in the present embodiment includes but is not limited to word frequency, Hamming code and characteristic distance Etc. information.

Specifically, step S6 " is indexed, part that will be after the quantization of target video to be retrieved is special according to described inverted file Levy pondization set and carry out similarity ratio with the local feature pondization set after the quantization of the single video lens in database video Compared with " detailed process be：Some code word in local feature pondization set after being quantified according to all query images, scans reverse rope The corresponding chained list of the code word in quotation part, obtains the query image in the code word similar to video of the database comprising the code word Degree.

Specifically, method disclosed in the present embodiment is in step S72：" according to the code book of database video local feature, to institute All local features for having query image carry out quantization encoding " after, also comprise the following steps：

Whole local features of all query images after quantization encoding are intersected and compared, the spy of all query images is determined It is target to be searched region to levy matching overlapping region；

Correspondingly, step S73：" whole local features after all query image quantization encodings are done into pondization processing, obtained Local feature pondization after all query images quantify is gathered ", specifically include：

Pond is carried out to the local feature for falling all query images in target to be searched region, target to be retrieved is obtained Local feature pondization set after the quantization of video.

It should be noted that excavate common character subset automatically by the correlation according to feature between image, and with The set determines the locus of target video to be retrieved in the picture, and whole process just may be used independent of any artificial mark The region of target video to be retrieved is obtained, obtained Query Result ratio is inquired about with target area progress and inquired about with whole pictures Obtained Query Result is more accurate.

Specifically, using the process schematic of the video retrieval method merged based on multiple image in the present embodiment as schemed Shown in 4.

As shown in Figure 5, Figure 6, present embodiment discloses a kind of video frequency search system merged based on multiple image, including：

Video processing module 10, distributed storage module 20 and retrieval module 30；

Video processing module 10 includes processing unit 11, the first extraction unit 12, the first cluster cell 13, first and quantifies to compile The code pond unit 15 of unit 14 and first；

Processing unit 11 is connected with database, and decoding is carried out to the video in database and is split with video lens, obtains many Individual video lens；

First extraction unit 12 is connected with processing unit 11 to carry out key-frame extraction to single video lens, and to key Frame carries out local shape factor；

First cluster cell 13 is connected with extraction unit 12 to be clustered to part local feature, by obtained cluster Heart set as database video local feature code book；

First quantization encoding unit 14 is connected with according to the code book of database video local feature, logarithm with cluster cell 13 Quantization encoding is carried out according to all local features of storehouse video；

First pond unit 15 is connected with quantization encoding unit 14 so that after quantization encoding, to single video lens, institute is relevant The local feature set of key frame carries out pond processing, obtains the local feature pondization set after single video lens quantify；

Distributed storage module 20 be connected with video processing module 10 with the code book according to database video local feature and Local feature pondization after single video lens quantify is gathered, and sets up inverted file index；

Retrieval module 30 be connected with distributed storage module 20 with several query images according to target video to be retrieved and Inverted file is indexed, and carries out the online retrieving of target video.

It should be noted that the video processing module 10 in the present embodiment is specially video processing service device group, it is distributed Memory module 20 is specially disk array, and retrieval module 30 is specially retrieval server group.Specific hardware configuration parameter referring to Table 1：

Table 1

It should be noted that the distributed storage module 20 at this supports dynamic insertion/deletion of video feature vector, with And support quick stochastic searching.

Specifically, retrieval module 30 is specifically included：Second extraction unit 31, the second quantization encoding unit 32, the second pond Unit 33, comparing unit 34 and retrieval unit 35；

All query images that second extraction unit 31 treats searched targets video carry out local shape factor；

Second quantization encoding unit 32 is connected with according to the database video local feature with the second extraction unit 31 All local features of all query images are carried out quantization encoding by code book；

Second pond unit 33 is connected with the second quantization encoding unit 32 with will be complete after all query image quantization encodings Portion's local feature does pondization processing, obtains the set of the local feature pondization after the quantization of target video to be retrieved；

Comparing unit 34 is connected with according to described inverted file rope with the second pond unit 33, distributed storage module 20 Draw, in the set of local feature pondization and database video after target video to be retrieved is quantified after the quantization of single video lens Local feature pondization set carry out similarity-rough set；

The similarity that retrieval unit 35 is connected to obtain according to comparing with comparing unit 34, enters to the video file inquired Row sequence, completes the online retrieving of target video.

Specifically, the first cluster cell 13 specifically for：

Correspondingly, the first described quantization encoding unit 14, specifically for：

Specifically, distributed storage module 20 is specifically included：Chained list sets up unit 21 and reverse indexing sets up unit 22；

Chained list sets up unit 21 successively using each code word ID in the code book of database video local feature as gauge outfit, sets up Chained list；

Reverse indexing, which sets up unit 22 and sets up unit 21 with chained list, to be connected to be scanned to the video in database, by institute Have in the video lens ID comprising the code word and relevant information press-in chained list, obtain inverted file index, wherein, described correlation Information includes word frequency and Hamming code.

Specifically, video processing module 30 also includes matching unit 36；

Matching unit 36 is connected with the second quantization encoding unit 32 with by the whole of all query images after quantization encoding Local feature, which intersects, to be compared, and the characteristic matching overlapping region for determining all query images is target to be searched region；

Correspondingly, the second described pond unit 33 is connected with matching unit 36, specifically for：

It should be noted that based on multiple image merge video frequency search system specific work process and main points with it is above-mentioned The video retrieval method merged based on multiple image is identical, and here is omitted.

It should be noted that the video retrieval method disclosed by the invention merged based on multiple image and system are with as follows Technique effect：

(1) using several inquiry target images, when expressing destination object, different visual angles can be taken into account, make description more Precisely, this is very helpful to the recall ratio for improving searching system., can be as list simultaneously by feature pool during many figure inquiries Width image querying is the same, still only describes target to be found with a characteristic vector so that search efficiency is held essentially constant.

(2) processed offline of database video section, by feature pool, using video lens rather than key frame to be single Position, retains the quantization characteristic vector behind pond, greatly reduces internal memory consuming and the record count in database, greatly improve Recall precision, saves memory consumption to the tens of former technology to several one thousandths, at the same keep quite, the search of even more high Precision.

(3) in several query image importations, excavated altogether come automatic by the correlation of feature between all query images Same character subset, determines the locus region of target to be searched in the picture, independent of any artificial mark with the set Note, so that it may obtain the region of target to be searched, as inquiry, is obtained than whole pictures more accurately Query Result.

Claims

1. a kind of video retrieval method merged based on multiple image, it is characterised in that including：

Part local feature is clustered, using obtained cluster centre set as database video local feature code book；

After quantization encoding, the local feature set to all key frames of single video lens carries out pond processing, obtains single Local feature pondization after video lens quantify is gathered；

Local feature pondization set after being quantified according to the code book of database video local feature and single video lens, sets up anti- To file index；

Indexed according to several query images and inverted file of target video to be retrieved, carry out the online retrieving of target video.

2. the method as described in claim 1, it is characterised in that described several query images according to target video to be retrieved With the reverse file index of inverted index file, the online retrieving of target video is carried out, is specifically included：

All query images for treating searched targets video carry out local shape factor；

According to the code book of the database video local feature, quantization volume is carried out to whole local features of all query images Code；

Whole local features after all query image quantization encodings are done into pondization processing, obtained after all query images quantizations Local feature pondization is gathered；

Indexed according to described inverted file, the local feature pondization set after target video to be retrieved is quantified is regarded with database Local feature pondization set after single video lens quantify in frequency carries out similarity-rough set；

The similarity obtained according to comparing, is ranked up to the video file inquired, completes the online retrieving of target video.

3. the method as described in claim 1, it is characterised in that described to be clustered to part local feature, by what is obtained Cluster centre set is specifically included as the code book of database video local feature：

In the whole local features extracted from all videos camera lens key frame, part local feature is spaced or randomly selected；

Based on default unsupervised distance method, the part local feature of the extraction is clustered, by what is obtainedkIt is individual to represent Property feature is used as code book；

Correspondingly, all local features of database video are carried out by the code book according to database video local feature Quantization encoding, is specifically included：

According tokWhole local features of video lens are carried out local feature arrow by individual feature code book in units of single key frame Amount quantifies, and obtains the local feature statistic histogram of each key frame.

4. the method as described in claim 1, it is characterised in that the code book and list according to database video local feature Local feature pondization set after the quantization of individual video lens, sets up inverted file index, specifically includes：

Video in database is scanned, all video lens ID and relevant information comprising the code word are pressed into chained list In, obtain inverted file index.

5. method as claimed in claim 2, it is characterised in that in the code book according to database video local feature, After all local features progress quantization encoding of all query images, in addition to：

Whole local features of all query images after quantization encoding are intersected and compared, the characteristic matching of all query images is determined Overlapping region is target to be searched region；

Correspondingly, whole local features by after all query image quantization encodings do pondization processing, obtain all look into The local feature pondization set after image quantization is ask, is specifically included：

Pond is carried out to the local feature for falling all query images in target to be searched region, target video to be retrieved is obtained Local feature pondization set after quantization.

6. a kind of video frequency search system merged based on multiple image, it is characterised in that including：Video processing module（10）, point Cloth memory module（20）And retrieval module（30）；

Video processing module（10）Including processing unit（11）, the first extraction unit（12）, the first cluster cell（13）, the first amount Change coding unit（14）And the first pond unit（15）；

Processing unit（11）It is connected with database, decoding is carried out to the video in database and is split with video lens, obtains multiple Video lens；

First extraction unit（12）With processing unit（11）Connect to carry out key-frame extraction to single video lens, and to key Frame carries out local shape factor；

First cluster cell（13）With extraction unit（12）Connection to part local feature to cluster, by obtained cluster Heart set as database video local feature code book；

First quantization encoding unit（14）With cluster cell（13）Connection is with according to the code book of database video local feature, logarithm Quantization encoding is carried out according to all local features of storehouse video；

First pond unit（15）With quantization encoding unit（14）Connection is so that after quantization encoding, to single video lens, institute is relevant The local feature set of key frame carries out pond processing, obtains the local feature pondization set after single video lens quantify；

Distributed storage module（20）With video processing module（10）Connection with the code book according to database video local feature and Local feature pondization after single video lens quantify is gathered, and sets up inverted file index；

Retrieve module（30）With distributed storage module（20）Connection with several query images according to target video to be retrieved and Inverted file is indexed, and carries out the online retrieving of target video.

7. system as claimed in claim 6, it is characterised in that described retrieval module（30）Specifically include：Second extracts single Member（31）, the second quantization encoding unit（32）, the second pond unit（33）, comparing unit（34）And retrieval unit（35）；

Second extraction unit（31）All query images for treating searched targets video carry out local shape factor；

Second quantization encoding unit（32）With the second extraction unit（31）Connection is with according to the database video local feature All local features of all query images are carried out quantization encoding by code book；

Second pond unit（33）With the second quantization encoding unit（32）Connection is with will be complete after all query image quantization encodings Portion's local feature does pondization processing, obtains the set of the local feature pondization after the quantization of target video to be retrieved；

Comparing unit（34）With the second pond unit（33）, distributed storage module（20）Connection is with according to described inverted file Index, after single video lens quantify in the set of local feature pondization and database video after target video to be retrieved is quantified Local feature pondization set carry out similarity-rough set；

Retrieval unit（35）With comparing unit（34）The similarity to be obtained according to comparing is connected, the video file inquired is entered Row sequence, completes the online retrieving of target video.

8. system as claimed in claim 6, it is characterised in that the first described cluster cell（13）Specifically for：

Correspondingly, the first described quantization encoding unit（14）, specifically for：

9. system as claimed in claim 6, its feature is being, described distributed storage module（20）Specifically include：Chain Table sets up unit（21）Unit is set up with reverse indexing（22）；

Chained list sets up unit（21）Successively using each code word ID in the code book of database video local feature as gauge outfit, chain is set up Table；

Reverse indexing sets up unit（22）Unit is set up with chained list（21）Connection to the video in database to be scanned, by institute Have in the video lens ID comprising the code word and relevant information press-in chained list, obtain inverted file index.

10. system as claimed in claim 7, it is characterised in that described retrieval module（30）Also include matching unit（36）；

Matching unit（36）With the second quantization encoding unit（32）Connection is with by the whole of all query images after quantization encoding Local feature, which intersects, to be compared, and the characteristic matching overlapping region for determining all query images is target to be searched region；

Correspondingly, the second described pond unit（33）With matching unit（36）Connection, specifically for：

Pond is carried out to the local feature for falling all query images in target to be searched region, target video to be retrieved is obtained Quantization after local feature pondization set.