CN107153670B

CN107153670B - Video retrieval method and system based on multi-image fusion

Info

Publication number: CN107153670B
Application number: CN201710059040.7A
Authority: CN
Inventors: 周晓; 朱才志; 张险峰; 魏京京
Original assignee: Hefei Lintu Information Technology Co ltd
Current assignee: Hefei Lintu Information Technology Co ltd
Priority date: 2017-01-23
Filing date: 2017-01-23
Publication date: 2020-08-14
Anticipated expiration: 2037-01-23
Also published as: CN107153670A

Abstract

The invention discloses a video retrieval method and a system based on fusion of multiple images, belonging to the technical field of video retrieval. In the online retrieval part, all query images are adopted, local features of all the query images are subjected to pooling after quantization to obtain quantized local feature pooling sets of all the query images, similarity comparison is carried out on the query images and the quantized local feature pooling sets of the video shots according to a reverse file index, and video retrieval is carried out. And discloses a video retrieval system based on the fusion of a plurality of images. The invention improves the recall ratio of video retrieval on the premise of ensuring the searching efficiency.

Description

Video retrieval method and system based on multi-image fusion

Technical Field

The invention relates to the technical field of video retrieval, in particular to a video retrieval method and a video retrieval system based on fusion of multiple images.

Background

Automatic video data retrieval is a content-based video retrieval problem, and aims to achieve convenient and efficient image/video information acquisition by performing computer processing, analysis and understanding on image/video content and building a structure and an index. In recent years, researchers at home and abroad carry out a great deal of research on video retrieval systems, and the currently developed intelligent video monitoring can detect, track and classify targets and can monitor some sudden abnormal events in real time.

At present, the work flow of a general video retrieval system is shown in fig. 1, and mainly includes two parts, namely off-line indexing and on-line retrieval. An off-line index building part: firstly, extracting key frames of video data in a database to convert the video data into image data; secondly, extracting key frame characteristics from the key frames; thirdly, all key frame characteristics are quantized and coded; and fourthly, establishing an inverted file index for fast retrieval according to the codes of the key frame characteristics. And an online retrieval part: firstly, extracting the characteristics of a query image; secondly, quantizing and coding the characteristics of all the query images; thirdly, comparing the similarity of the characteristics of the query image and the characteristics of the key frames of the database video through inverted file indexing; fourthly, sequencing the inquired video key frames according to the similarity between the characteristics of the inquired images and the characteristics of the key frames of the video database; fifthly, a video file sorting result is obtained by fusing the key frame sorting results.

However, the video retrieval technique described above has the following disadvantages: firstly, the recall ratio is not high, because the query image is generally not enough to describe the target to be queried, especially for non-rigid or objects with complex spatial topology, and in practical application, the omission of any spider-web trails of the monitored target may cause the loss of the monitored target, so the recall ratio is often more important than the precision ratio in practical application. Secondly, the query efficiency is low, in the working process of the existing video retrieval technology, each key frame in the video needs to be sequenced, and then the sequencing results of the video are obtained by fusing the sequencing results of the key frames, but as the number of the video frames is far greater than that of the videos, the sequencing of the key frames leads to low query speed of the target and low utilization rate of resources.

Disclosure of Invention

The invention aims to provide a video retrieval method and a video retrieval system based on fusion of a plurality of images so as to improve the recall ratio of video retrieval.

In order to realize the purpose, the invention adopts the technical scheme that: in a first aspect, the present invention provides a video retrieval method based on fusion of multiple images, the method including:

decoding the database video and dividing the video shots to obtain a plurality of video shots;

extracting key frames of a single video shot, and extracting local features of the key frames;

clustering partial local features, and taking an obtained clustering center set as a codebook of the local features of the database video;

according to the codebook of the local features of the database video, all the local features of the database video are quantized and coded;

after quantization coding, performing pooling processing on local feature sets of all key frames of a single video shot to obtain a local feature pooling set after quantization of the single video shot;

establishing a reverse file index according to a codebook of local characteristics of a database video and a local characteristic pooling set after quantization of a single video lens;

and performing online retrieval of the target video according to the plurality of query images and the reverse file index of the target video to be retrieved.

In a second aspect, the present invention provides a video retrieval system based on fusion of multiple images, the system comprising: the system comprises a video processing module, a distributed storage module and a retrieval module;

the video processing module comprises a processing unit, a first extraction unit, a first clustering unit, a first quantization coding unit and a first pooling unit;

the processing unit is connected with the database and used for decoding the video in the database and dividing the video shots to obtain a plurality of video shots;

the first extraction unit is connected with the processing unit to extract key frames of a single video shot and extract local features of the key frames;

the first clustering unit is connected with the extracting unit to cluster partial local features, and an obtained clustering center set is used as a codebook of the database video local features;

the first quantization coding unit is connected with the clustering unit so as to perform quantization coding on all local characteristics of the database video according to the codebook of the local characteristics of the database video;

the first pooling unit is connected with the quantization coding unit so as to perform pooling processing on local feature sets of all key frames of a single video shot after quantization coding to obtain a local feature pooling set after quantization of the single video shot;

the distributed storage module is connected with the video processing module to build a reverse file index according to a codebook of local characteristics of a database video and a local characteristic pooling set after a single video lens is quantized;

the retrieval module is connected with the distributed storage module to perform online retrieval of the target video according to the plurality of query images and the reverse file index of the target video to be retrieved.

Compared with the prior art, the invention has the following technical effects: first, the target video is searched and retrieved by using a plurality of query images of the same target video, different viewing angles can be considered, the retrieved target video is more accurately described, and the recall ratio of the target video is improved. Secondly, by establishing a reverse file index part in an off-line mode and taking the video shots of the database video as a unit, the local features of all key frames of a single video shot are pooled to obtain a pooled set of the local features after the single video shot is quantized, the memory consumption and the record number in the database are greatly reduced, the retrieval speed is accelerated, and the memory consumption is saved to dozens or even one thousandth of the original technology.

Drawings

FIG. 1 is a flow chart of a prior art video retrieval process according to the background of the invention;

fig. 2 is a schematic flowchart of a video retrieval method based on fusion of multiple images according to an embodiment of the present invention;

FIG. 3 is a flow chart illustrating the subdivision step of step S7 according to an embodiment of the present invention;

FIG. 4 is a flow diagram illustrating a video retrieval process in one embodiment of the invention;

fig. 5 is a schematic structural diagram of a video retrieval system based on fusion of multiple images according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a distributed structure of a video retrieval system based on fusion of multiple images according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to fig. 2 to 6.

As shown in fig. 2, the present embodiment provides a video retrieval method based on fusion of a plurality of images, the method including the following steps S1 to S7:

s1, decoding the database video and dividing the video shots to obtain a plurality of video shots;

specifically, the plurality of video shots therein means to be divided into at least one video shot.

S2, extracting key frames of the single video shot, and extracting local features of the key frames;

specifically, at least one key frame is extracted from a single video shot, and feature extraction is performed on the key frame, where the feature extraction includes, but is not limited to, local feature extraction and global feature extraction, and in this embodiment, local feature extraction is performed on the key frame as a preferable scheme.

S3, clustering partial local features, and taking the obtained clustering center set as a codebook of the database video local features;

s4, carrying out quantization coding on all local characteristics of the database video according to the codebook of the local characteristics of the database video;

s5, after quantization coding, pooling local feature sets of all key frames of a single video shot to obtain a quantized local feature pooled set of the single video shot;

the pooling (pond) in this embodiment includes, but is not limited to: average pooling (average pooling), maximum pooling (max pooling), and the like.

It should be noted that the quantized local feature pooling set here is a result of pooling local features of all key frames of a single video shot, and is different from the concept of the local features of the key frames.

S6, establishing a reverse file index according to a codebook of local characteristics of a database video and a local characteristic pooling set after quantization of a single video shot;

it should be noted that, since the number of codebooks corresponds to the dimension of the statistical histogram in the search, the number of codebooks is large, for example, several tens of thousands to millions. In this way, in the quantized local feature pooling set, most of the code words are assigned with zero values, which makes the quantized local feature pooling set very sparse in distribution, and with this sparsity, the reverse file index can be established by using the inverse sorting in the text retrieval.

And S7, performing online retrieval of the target video according to the plurality of query images and the reverse file index of the target video to be retrieved.

In this embodiment, the plurality of query images refers to at least two query images.

Specifically, as shown in fig. 3, the step S7 includes the following steps S71 to S75:

s71, extracting local features of all query images of the target video to be retrieved;

s72, carrying out quantitative coding on all local features of all query images according to the codebook of the local features of the database video;

s73, performing pooling treatment on all local features of all query images after quantization coding to obtain a local feature pooling set of all query images after quantization;

s74, according to the reverse file index, comparing the similarity of the quantized local feature pooling set of the target video to be retrieved with the similarity of the quantized local feature pooling set of a single video shot in the database video;

and S75, sequencing the inquired video files according to the similarity obtained by comparison, and completing the online retrieval of the target video.

In this embodiment, when a plurality of images are used for query, the local features of all query images are pooled, and the local features of all query images can be converted into a precisely quantized local feature pooling set capable of describing a target video, which is used as a new feature of all query images, so that the search efficiency of the target video and the search efficiency of the existing search process are basically kept unchanged.

Specifically, S3: "clustering partial local features, and using the obtained cluster center set as a codebook of the database video local features", specifically comprising the following subdivision steps:

extracting partial local features at intervals or randomly from all local features extracted from all video shot key frames;

clustering the extracted partial local features based on a preset unsupervised distance method, and taking the obtained k representative features as a codebook;

it should be noted that the unsupervised distance method preset in the present embodiment includes, but is not limited to, a k-means unsupervised distance method.

Accordingly, S4: "according to the codebook of the local features of the database video, all the local features of the database video are quantized and encoded", which specifically includes:

and according to the k feature codebooks, carrying out local feature vector quantization on all local features of the video lens by taking a single key frame as a unit to obtain a local feature statistical histogram of each key frame.

Specifically, S6: "establishing a reverse file index according to a codebook of local characteristics of a database video and a quantized local characteristic pooling set of a single video shot" specifically includes the following subdivision steps:

sequentially taking each code word ID in a codebook of local characteristics of a database video as a header, and establishing a linked list;

and scanning the video in the database, and pressing all video shot IDs containing the code words and related information into a linked list to obtain a reverse file index.

It should be noted that the related information in this embodiment includes, but is not limited to, information such as word frequency, hamming code, and feature distance.

Specifically, the specific process of step S6 "comparing the similarity between the quantized local feature pooling set of the target video to be retrieved and the quantized local feature pooling set of a single video shot in the database video according to the inverted file index" is as follows: and scanning a linked list corresponding to the code word in the reverse index file according to a certain code word in the quantized local feature pooling set of all the query images to obtain the similarity between the query images on the code word and the videos of the database containing the code word.

Specifically, the method disclosed in this embodiment is executed in step S72: after all local features of all query images are quantized and coded according to the codebook of the local features of the database video, the method further comprises the following steps:

cross-comparing all local features of all query images subjected to quantization coding, and determining feature matching overlapping regions of all query images as target regions to be searched;

accordingly, step S73: "all local features after all query image quantization codes are subjected to pooling processing to obtain a local feature pooling set after all query images are quantized", and the method specifically comprises the following steps:

and pooling local features of all query images in the target area to be searched to obtain a quantized local feature pooling set of the target video to be retrieved.

It should be noted that, by automatically discovering a common feature subset according to the correlation of features between images, and determining the spatial position of the target video to be retrieved in the images by using the set, the whole process does not depend on any manual labeling, so that the region of the target video to be retrieved can be obtained, and the query result obtained by querying the target region is more accurate than the query result obtained by querying the whole image.

Specifically, a process schematic diagram of a video retrieval method based on multiple image fusion in the present embodiment is shown in fig. 4.

As shown in fig. 5 and fig. 6, the present embodiment discloses a video retrieval system based on fusion of multiple images, which includes:

a video processing module 10, a distributed storage module 20 and a retrieval module 30;

the video processing module 10 comprises a processing unit 11, a first extracting unit 12, a first clustering unit 13, a first quantization coding unit 14 and a first pooling unit 15;

the processing unit 11 is connected with the database, and decodes the video in the database and divides the video shots to obtain a plurality of video shots;

the first extraction unit 12 is connected with the processing unit 11 to extract key frames of a single video shot and extract local features of the key frames;

the first clustering unit 13 is connected with the extracting unit 12 to cluster partial local features, and an obtained clustering center set is used as a codebook of the local features of the database video;

the first quantization coding unit 14 is connected with the clustering unit 13 to perform quantization coding on all local features of the database video according to the codebook of the local features of the database video;

the first pooling unit 15 is connected with the quantization encoding unit 14 to pool the local feature sets of all key frames of a single video shot after quantization encoding, so as to obtain a local feature pooling set after quantization of the single video shot;

the distributed storage module 20 is connected with the video processing module 10 to establish a reverse file index according to a codebook of local characteristics of a database video and a local characteristic pooling set after quantization of a single video shot;

the retrieval module 30 is connected with the distributed storage module 20 to perform online retrieval of the target video according to the plurality of query images and the reverse file index of the target video to be retrieved.

It should be noted that the video processing module 10 in this embodiment is specifically a video processing server group, the distributed storage module 20 is specifically a disk array, and the retrieval module 30 is specifically a retrieval server group. See table 1 for specific hardware configuration parameters:

TABLE 1

It should be noted that the distributed storage module 20 herein supports dynamic insertion/deletion of video feature vectors, and supports fast random search.

Specifically, the retrieving module 30 specifically includes: a second extraction unit 31, a second quantization encoding unit 32, a second pooling unit 33, a comparison unit 34, and a retrieval unit 35;

the second extraction unit 31 performs local feature extraction on all query images of the target video to be retrieved;

the second quantization coding unit 32 is connected to the second extracting unit 31 to perform quantization coding on all local features of all query images according to the codebook of local features of the database video;

the second pooling unit 33 is connected with the second quantization encoding unit 32 to pool all the local features after quantization encoding of all the query images, so as to obtain a quantized local feature pooling set of the target video to be retrieved;

the comparison unit 34 is connected with the second pooling unit 33 and the distributed storage module 20 to compare the similarity between the quantized local feature pooling set of the target video to be retrieved and the quantized local feature pooling set of a single video shot in the database video according to the reverse file index;

the retrieval unit 35 is connected to the comparison unit 34 to sort the queried video files according to the similarity obtained by comparison, so as to complete online retrieval of the target video.

Specifically, the first clustering unit 13 is specifically configured to:

accordingly, the first quantization encoding unit 14 is specifically configured to:

Specifically, the distributed storage module 20 specifically includes: a linked list establishing unit 21 and a reverse index establishing unit 22;

the linked list establishing unit 21 establishes a linked list by sequentially taking each code word ID in the codebook of the local characteristics of the database video as a header;

the reverse index establishing unit 22 is connected to the linked list establishing unit 21 to scan the video in the database, and presses all the video shot IDs and related information including the code word into the linked list to obtain a reverse file index, where the related information includes the word frequency and the hamming code.

In particular, the video processing module 30 further comprises a matching unit 36;

the matching unit 36 is connected with the second quantization coding unit 32 to cross-compare all local features of all query images subjected to quantization coding, and determine feature matching overlapping regions of all query images as target regions to be searched;

correspondingly, the second pooling unit 33 is connected to the matching unit 36, and is specifically configured to:

It should be noted that the specific working process and the key point of the video retrieval system based on the fusion of multiple images are the same as those of the video retrieval method based on the fusion of multiple images, and are not described herein again.

It should be noted that the video retrieval method and system based on the fusion of multiple images disclosed by the invention have the following technical effects:

(1) when a plurality of query target images are used, different viewing angles can be considered when a target object is expressed, so that the description is more accurate, and the improvement of the recall ratio of a retrieval system is greatly facilitated. Meanwhile, through feature pooling during multi-image query, the target to be searched can be described by only one feature vector as single image query, so that the searching efficiency is basically kept unchanged.

(2) The offline processing of the video part of the database keeps the quantized feature vector after pooling by the feature pooling with the video shot instead of the key frame as a unit, thereby greatly reducing the memory consumption and the record number in the database, greatly improving the retrieval efficiency, saving the memory consumption to one dozen to one thousand times of the original technology, and simultaneously keeping equivalent or even higher searching precision.

(3) In a plurality of query image input parts, a common feature subset is automatically discovered through the correlation of features among all query images, the spatial position area of a target to be searched in the image is determined by the set, the area of the target to be searched can be obtained without any manual marking, and a query result which is more accurate than that of the whole image is obtained by taking the area as a query.

Claims

1. A video retrieval method based on fusion of a plurality of images is characterized by comprising the following steps:

performing online retrieval of the target video according to the plurality of query images and the reverse file index of the target video to be retrieved;

the online retrieval of the target video is performed according to the multiple query images and the reverse file index of the target video to be retrieved, and specifically comprises the following steps:

extracting local features of all query images of a target video to be retrieved;

according to the codebook of the local features of the database video, all the local features of all the query images are quantized and coded;

all local features after all query images are quantized and coded are subjected to pooling processing, and a local feature pooling set after all query images are quantized is obtained;

according to the reverse file index, carrying out similarity comparison on the local feature pooling set after the target video to be retrieved is quantized and the local feature pooling set after a single video shot in the database video is quantized;

according to the similarity obtained by comparison, sequencing the inquired video files to complete the online retrieval of the target video;

after the quantizing and encoding of all local features of all query images according to the codebook of local features of the database video, the method further comprises:

correspondingly, the pooling processing is performed on all local features after all query images are quantized and coded, so as to obtain a pooled set of the local features after all query images are quantized, and the pooling processing specifically comprises the following steps:

and pooling local features of all query images in the target area to be searched to obtain a local feature pooling set after the target video to be retrieved is quantized.

2. The method according to claim 1, wherein the clustering partial local features and using the obtained cluster center set as a codebook of the local features of the database video specifically comprises:

correspondingly, the performing quantization coding on all local features of the database video according to the codebook of the local features of the database video specifically includes:

3. The method according to claim 1, wherein the establishing an inverse file index based on a codebook of local features of a database video and a quantized local feature pooling set of a single video shot comprises:

4. A video retrieval system based on fusion of multiple images, comprising: a video processing module (10), a distributed storage module (20) and a retrieval module (30);

the video processing module (10) comprises a processing unit (11), a first extraction unit (12), a first clustering unit (13), a first quantization coding unit (14) and a first pooling unit (15);

the processing unit (11) is connected with the database, and decodes the video in the database and divides the video shots to obtain a plurality of video shots;

the first extraction unit (12) is connected with the processing unit (11) to extract key frames of a single video shot and extract local features of the key frames;

the first clustering unit (13) is connected with the first extracting unit (12) to cluster partial local features, and an obtained clustering center set is used as a codebook of the database video local features;

the first quantization coding unit (14) is connected with the first clustering unit (13) to perform quantization coding on all local characteristics of the database video according to the codebook of the local characteristics of the database video;

the first pooling unit (15) is connected with the first quantization coding unit (14) so as to perform pooling processing on the local feature sets of all key frames of a single video shot after quantization coding to obtain a quantized local feature pooling set of the single video shot;

the distributed storage module (20) is connected with the video processing module (10) to build a reverse file index according to a codebook of local characteristics of a database video and a local characteristic pooling set after a single video shot is quantized;

the retrieval module (30) is connected with the distributed storage module (20) to perform online retrieval of the target video according to a plurality of query images and a reverse file index of the target video to be retrieved;

the retrieval module (30) specifically comprises: a second extraction unit (31), a second quantization encoding unit (32), a second pooling unit (33), a comparison unit (34), and a search unit (35);

the second extraction unit (31) extracts local features of all query images of the target video to be retrieved;

the second quantization coding unit (32) is connected with the second extraction unit (31) to perform quantization coding on all local features of all query images according to the codebook of the local features of the database video;

the second pooling unit (33) is connected with the second quantization coding unit (32) to pool all local features of all query images after quantization coding, so as to obtain a quantized local feature pooling set of a target video to be retrieved;

the comparison unit (34) is connected with the second pooling unit (33) so as to compare the similarity of the quantized local feature pooling set of the target video to be retrieved and the quantized local feature pooling set of a single video shot in the database video according to the reverse file index;

the retrieval unit (35) is connected with the comparison unit (34) to sort the inquired video files according to the similarity obtained by comparison, and complete the online retrieval of the target video;

the retrieval module (30) further comprises a matching unit (36);

the matching unit (36) is connected with the second quantization coding unit (32) to cross compare all local features of all query images subjected to quantization coding and determine feature matching overlapping regions of all query images as target regions to be searched;

correspondingly, the second pooling unit (33) is connected to a matching unit (36), in particular for:

5. The system according to claim 4, wherein said first clustering unit (13) is specifically configured to:

accordingly, the first quantization encoding unit (14) is specifically configured to:

6. The system according to claim 4, characterized in that said distributed storage module (20) comprises in particular: a linked list establishing unit (21) and a reverse index establishing unit (22);

the linked list establishing unit (21) sequentially establishes a linked list by taking the ID of each code word in the codebook of the local characteristics of the database video as a header;

the reverse index establishing unit (22) is connected with the linked list establishing unit (21) to scan the videos in the database, and all the video shot IDs containing the code words and the related information are pressed into the linked list to obtain a reverse file index.