CN104504162A

CN104504162A - Video retrieval method based on robot vision platform

Info

Publication number: CN104504162A
Application number: CN201510028271.2A
Authority: CN
Inventors: 邵可佳
Original assignee: BEIJING ZFBOTS TECHNOLOGY Co Ltd
Current assignee: Beijing Guodian Ruiyuan Technology Development Co., Ltd
Priority date: 2015-01-21
Filing date: 2015-01-21
Publication date: 2015-04-08
Anticipated expiration: 2035-01-21
Also published as: CN104504162B

Abstract

The invention discloses a video retrieval method based on a robot vision platform. The video retrieval method based on the robot vision platform includes the steps that a binocular camera collects video fragments, and transmits the video fragments to a video processing system; the video processing system extracts a video fingerprint in each video frame in each video fragment, and concretely, the video processing system adjusts distinguishing rate of the video frames into 320, detects feature points in the video frames, detects a local image around the feature points, obtains all the strongest feature points which can be used in the local image, describes all the strongest feature points through 64 dimension gray levels and 16 dimension colors, and builds a gray level and color vision lexicon; the video fingerprints are retrieved according to the gray level and color vision lexicon. The video retrieval method based on the robot vision platform uses the binocular camera to collect the video fragments, and thereby improves accuracy and flexibility of video retrieval, enables a user to conveniently and automatically obtain and retrieve video, saves retrieval time at every turn by building the vision lexicon, and improves retrieval efficiency.

Description

A kind of video retrieval method based on robot vision platform

Technical field

The present invention relates to video identification field, particularly relate to a kind of video retrieval method based on robot vision platform.

Background technology

Robot vision platform, being called for short RobotEye, is a robot vision searching system, and collection robot or smart machine gather vision data and feed back to robot or the valuable knowledge information of smart machine.Since eighties of last century nineties, occurred content-based video analysis and retrieval research, its object by carrying out computer disposal, analysis and understanding to video content, sets up structure and index exactly, to realize acquiring video information easily and effectively.Therefore, content-based video analysis and retrieval research obtain from the beginning and pay attention to widely, and achieve plentiful and substantial achievement in research.

Content-based video analysis from proposition till now, roughly lives through 3 stages.The research of first stage mainly concentrates on the analysis of video structure and browses, and the gordian technique wherein related to has the merging of shot boundary detector, key-frame extraction and scene.The research of subordinate phase mainly launches around similarity retrieval, and key problem is the extraction of feature and measuring of feature space distance.The problem that the research in the first two stage to some extent solves video frequency searching and browses. and valuable exploration has been carried out to video management and acquiring technology.The work of phase III proposes the information extraction of Semantic-Oriented, and this is also the focus of current content-based video analysis research.The basic goal of this stage work builds the bridge between low-level image feature to high-level semantic, and final formation meets information index and the retrieval mode of human thinking's form.

The detection of video lens

The key of Shot Detection determines the conversion from camera lens to camera lens, and utilize the conversion regime between camera lens to find difference between lens image, transformation between camera lens comprises shear, gradual change and digital special effect, often needs to use different algorithms for different situations.Wherein the algorithm of change detecte is the most ripe, and the detection algorithm of gradual change and digital special effect often relies on certain a priori assumption, and technology need perfect further.Conventional automatic lens detection method mainly contains following several:

(1) based on the lens detection method of pixel, utilize the difference between video two frame respective pixel to come detector lens border, the shortcoming of this method is the impact being easily subject to noise;

(2) based on pixel average, the lens detection method of the statistical natures such as variance;

(3) based on the lens detection method of video features, the comparative approach comprised based on edge feature (extracts the edge feature of the every two field picture of video, border is determined) by the unexpected change of Edge detected, adopt the method for color ratio feature (Color RatioFeature), the method etc. based on Optical-flow Feature, the computational complexity of these methods is all larger;

(4) based on histogrammic lens detection method, what the method utilized the grey level histogram of video frame images or color histogram relatively carrys out detection boundaries, reaches good balance, therefore use the most general between the precision and speed of Shot Detection.

Key-frame extraction

After shot segmentation, browse for being supplied to user. generally to carry out the extraction work of camera lens key frame (Key Frame).The key frame of a camera lens is exactly a two field picture of these camera lens main contents of reflection or some two field pictures.Key frame choose the main matter that must can reflect on the one hand in camera lens, thus describe should restrain as far as possible completely true; On the other hand, for ease of management, data volume should be as far as possible little, and calculating should not be too complicated.Extraction method of key frame mainly comprises following several:

(1) the simplest extraction method of key frame extracts frame on camera lens fixed position as key frame.Such as the first frame, intermediate frame or last frame, or fixing frame number of being separated by is sampled a frame, and this method is simple to operate, but underaction, therefore can not effectively reflect camera lens content;

(2) comparatively complicated method comprises the features such as content-based complicacy, lens movable to extract key frame, and these methods more adequately can reflect camera lens content;

(3) comparatively ideal method is by analyzing video visual content (color histogram net, movable information) situation over time, determining a few days of required key frame, and is that camera lens extracts key frame according to certain rule;

(4) a kind of conventional extraction method of key frame is also had to be adopt image mosaic (Mosaic) technology, also panorama picture formation technology is claimed, namely splice splitting each frame of camera lens obtained, merge the background information of multiframe in camera lens, the spliced map video beam generating a panorama represents the content of video clips, although panorama sketch can reflect camera lens content comparatively all sidedly, but because algorithm is to the sensitivity of foreground moving, the application of this technology is subject to certain restrictions, also have additive method, such as, select key frame etc. by Unsupervised clustering technology.

Content-based similarity retrieval

Current video frequency searching mainly relies on the similarity of video content, claims similarity retrieval.Similarity retrieval requires that user submits interested video clips to, and other segment in sample segment and video library is made similarity system design by retrieving, and returns result for retrieval by similarity size.The research of Video similarity retrieval mainly solves two problems: the feature how extracting reflecting video content; Similarity how between measures characteristic.

In feature extraction, current video retrieval technology mainly from traditional image retrieval technologies directly or ask what ground connection succession was come, color, texture and shape are widely used low-level image features in image retrieval, and correspondingly, they are also features very important in video frequency searching.But video is not simple with picture sequence, shows very strong correlativity between adjacent video frames.In recent research, increasing work starts to utilize the movable information of video to provide sequential clue for video frequency searching, and it is several in similarity measurement that the movable information used in video frequency searching mainly contains movement locus of object, the principal component of block motion vector, sequential texture and camera motion.Similarity measurement generally can be divided into local matching and global registration two kinds, and local similarity measurement method make use of the sequential relationship between video clips, but is only confined to, in simple linear relationship, more have the shortcoming of calculation of complex; Contrary overall similarity measurement method has the advantages such as index information is few, retrieval rate is fast.But lack the sequential relationship of video and describe, therefore, more complicated similarity measurement method is suggested, such as moving object track, the shortest characteristic curve etc. based on stream shape hypothesis.

Content based video retrieval system is focus and the difficulties of computer science research always.The retrieval of this technology still just to low-level image feature at present, greatly differ from each other with the high level semantic-concept in people's thinking, how crossing over semantic gap also needs to do more research.

Summary of the invention

The invention provides a kind of video retrieval method based on robot vision platform, improve accuracy and the dirigibility of video frequency searching, facilitate user and carry out video automatic acquisition and retrieval, improve effectiveness of retrieval.

The object of the invention is to be achieved through the following technical solutions:

Based on a video retrieval method for robot vision platform, it is characterized in that, the method comprises:

Binocular camera gathers video segment, and sends described video segment to processing system for video;

Processing system for video extracts the video finger print of each frame of video in described video segment, concrete, be 320 by the resolution adjustment of each frame of video described, detect the unique point in frame of video, around described feature point detection topography, obtain all the strongest spendable unique points in described topography, the strongest described unique point 64 dimension gray scales and 16 dimension colours are described, and sets up gray scale and color vision dictionary;

According to gray scale and color vision dictionary, described video finger print is retrieved.

Described video finger print is one group of differentiated discrete key frame of gray-scale value.

Described video finger print to be retrieved, comprising: the key frame in video, front and back successive frame and fingerprint key frame picture are retrieved.

The described retrieval to the key frame in video, front and back successive frame comprises, and extract the key frame in video and front and back successive frame, the motion vector of movement images fragment, occurs that the motion vector of difference is the object in motion.

Compared with prior art, the embodiment of the present invention at least has the following advantages:

In the embodiment of the present invention, flexible, the accurate feature of binocular camera is utilized to gather video segment, improve accuracy and the dirigibility of video frequency searching, user does not need complicated operation can carry out video automatic acquisition and retrieval yet, by setting up vision dictionary, greatly saving the time that user retrieves at every turn, having improve effectiveness of retrieval, by the collection and the retrieval that utilize this method to carry out video, retrieval accuracy is high, dirigibility is large, easy to operate.

Accompanying drawing explanation

Fig. 1 is the FB(flow block) of a kind of video retrieval method based on robot vision platform that the embodiment of the present invention proposes.

Embodiment

As shown in Figure 1, be the structured flowchart of a kind of video retrieval method based on robot vision platform that the embodiment of the present invention proposes, comprise:

Binocular camera gathers video segment, and sends video segment to processing system for video; Processing system for video extracts the video finger print of each frame of video in video segment, concrete, be 320 by the resolution adjustment of each frame of video, detect the unique point in frame of video, around feature point detection topography, obtain all the strongest spendable unique points in topography, the strongest unique point 64 dimension gray scales and 16 dimension colours are described, and sets up gray scale and color vision dictionary; According to gray scale and color vision dictionary, video finger print is retrieved.

Video finger print is one group of differentiated discrete key frame of gray-scale value.The process that video finger print extracts is as follows:

The first step, input video with a fixing frame rate (S frame per second) by resampling, to tackle the change of frame rate.

In second step, each resampling frame is converted into gray scale, and be enough to the change of resisting color to make the fingerprint identification method of proposal, this method is not only applicable to the editing of color video, is applicable to classical black-and-white film yet.

In 3rd step, the size of each gray scale frame is adjusted, with make it width and highly respectively specification turn to two fixing values: X, Y.The size variation that this step makes the fingerprint identification method of proposal can resist any one factor to cause.

In 4th step, the frame after each adjustment has been divided into N capable and M row, defines the matrix of N*M.Finally, each block in matrix is all calculated its barycenter gradient direction, and then each frame is obtained for a three-dimensional fingerprint vector.

Above-mentioned barycenter gradient direction is: in coordinate (x, the y) position of kth frame, and represent the brightness value of this point by function f (x, y, k), the gradient of coordinate (x, y) is defined as: the direction that the f function of coordinate (x, y) is the fastest is exactly gradient direction.

G _x＝f[x+1，y，k]-f[x-1，y，k]

G _y＝f[x，y+1，k]-f[x，y-1，k]

Gradient vector f also can use amplitude function r [x, y, k] and phase function θ [x, y, k] to replace.

Further, video finger print is retrieved, comprising: the key frame in retrieve video, front and back successive frame and fingerprint key frame picture.Video finger print carries out retrieving and comprises:

(1) fingerprint modelling

Under the prerequisite once realized that the fingerprint sequence of proposal is regarded as a steady ergodic process, fingerprint matching likely goes wrong.We notice, similar analysis for watermark detection, in Voice & Video fingerprint detection.First, { c [n, m, k], 1nN, 1mM, 1kK} are that the fingerprint sequence p after specification is a mean value is 0, and variance is the stochastic process of 1 by their average and standard deviation further specification for all barycenter of a fingerprint sequence.Meeting under ergodic prerequisite, autocorrelation function R and Q can be gone out by the time average estimate of autocorrelation of the fingerprint sequence of reality, and they are also for obtaining the probability of false alarm when a given fixed threshold.

(2) determination of threshold value T

The fingerprint matching calculated fast and easily can obtain with Euclidean distance square, represents average and the standard deviation of distance D with μ D and σ D respectively.Under general hypothesis, distance D Normal Distribution N (μ D, σ ^2D).

Key frame in retrieve video, front and back successive frame comprise, and extract the key frame in video and front and back successive frame, the motion vector of movement images fragment, occurs that the motion vector of difference is the object in motion, specifically can be motion-captured with reference to OpenCV.

The above; be only the present invention's preferably embodiment, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; the change that can expect easily or replacement, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claims.

Claims

1. based on a video retrieval method for robot vision platform, it is characterized in that, the method comprises:

2. video retrieval method as claimed in claim 1, it is characterized in that, described video finger print is one group of differentiated discrete key frame of gray-scale value.

3. video retrieval method as claimed in claim 1, is characterized in that, describedly retrieves video finger print, comprising: the key frame in retrieve video, front and back successive frame and fingerprint key frame picture.

4. video retrieval method as claimed in claim 3, it is characterized in that, the key frame in described retrieve video, front and back successive frame comprise, and extract the key frame in video and front and back successive frame, the motion vector of movement images fragment, occurs that the motion vector of difference is the object in motion.