CN103559196A

CN103559196A - Video retrieval method based on multi-core canonical correlation analysis

Info

Publication number: CN103559196A
Application number: CN201310438216.1A
Authority: CN
Inventors: 卜佳俊; 高珊; 李平; 陈纯; 何占盈; 宋明黎
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2013-09-23
Filing date: 2013-09-23
Publication date: 2014-02-05
Anticipated expiration: 2033-09-23
Also published as: CN103559196B

Abstract

Disclosed is a video retrieval method based on multi-core canonical correlation analysis. The method includes grasping text descriptions corresponding to the video on internet, and then operating on the video: firstly dividing the video according to whether a shot is mutated or not, extracting key frames, extracting vision features of the key frames and moving features of the shot to form video feature vectors, and extracting word-frequency features from the text descriptions of each video; then utilizing the method of the multi-core canonical correlation analysis to obtain mapping matrixes and low-dimensional representation of the video features and the word-frequency features, and allowing the mapping matrixes and the low-dimensional representation to have the maximum correlation in low-dimensional space; finally, when a user inputs key words to perform video retrieval, acquiring the low-dimensional representation of the word-frequency features of the key words according to the mapping matrixes of the word-frequency features, and returning video retrieval results sequentially from large to small of the degrees of cosine similarity with the low-dimensional representation of the video features. The method has the advantages that the correlation of video content and the retrieval key words is enhanced, and the accuracy of retrieval by the user is improved.

Description

A kind of video retrieval method based on multinuclear canonical correlation analysis

Technical field

The present invention relates to the technical field of video frequency searching, particularly the video retrieval method based on multinuclear canonical correlation analysis.

Background technology

Along with the develop rapidly of computer network, multimedia technology mechanics of communication in recent years, people can upload, watch and download all kinds of video informations by internet.Internet has become huge video warehouse gradually, and how more fast and effeciently the required video information of retrieval user has become information retrieval hot issue day by day.

Traditional video retrieval method is text based, it using video tab information as key word and video form matching relationship one by one, by key word being carried out to the operations such as feature extraction, pre-service, carry out Cluster Classification afterwards.The method efficiency of this manual mark video information that places one's entire reliance upon is lower, to the descriptive power deficiency of video information and experience that need to be certain.Therefore, text based search method can not meet the growing demand of user.From 20th century the eighties, content-based video frequency searching is progressively paid close attention to by people, more becomes in recent years study hotspot.The method can under the artificial situation participating in, automatically be extracted video features, and be not only to depend on label information not having.Particularly, the method camera lens cut apart with key frame extraction after extract the visual signature of key frame and the motion feature of camera lens, and typing video frequency searching database.During user search, video is retrieved according to the words-frequency feature of user search keyword, and according to similarity order from high to low, result for retrieval is returned to user.Adopt content-based video frequency searching can more objective, specifically and all sidedly describe video information, reduce subjectivity and the limitation of textual description information, and greatly improve the degree of accuracy of retrieval.

In machine learning field, content-based video retrieval method is not still a lot, but also obtained certain achievement, as the good joint classification feature optimal algorithm of performance of classifying at high dimensional data, the method can be selected and the maximally related union feature of classifying, and reduces in addition input dimension descent algorithm and the method based on rough set of descriptor data are carried out to the semantic dimensionality reduction descriptor index method keeping etc..

Summary of the invention

In order to help user to arrive the video of search by quick-searching, to improve the Experience Degree of watching video, the present invention proposes a kind of video retrieval method based on multinuclear canonical correlation analysis, the method comprises the following steps:

1, from internet captures video, for each video and textual description thereof, carry out following operation:

1) according to camera lens, whether suddenly change video is carried out to cutting, extract its key frame, and the motion feature that extracts visual signature in key frame and camera lens forms video feature vector, for the textual description of each video, extract words-frequency feature;

2) utilize multinuclear Canonical Correlation Analysis, obtain respectively the mapping matrix of video features and words-frequency feature, thus the low-dimensional that obtains both correspondences represent, make them maximum in the correlativity of low-dimensional data space;

3) when user inputs keyword and carries out video frequency searching, the low-dimensional that obtains keyword words-frequency feature according to the mapping matrix of words-frequency feature represents, the descending video frequency searching result of returning successively of cosine similarity representing according to itself and video features low-dimensional.

Further, the processing video described in step 1) and the method for corresponding textual description thereof, specifically:

1) when video lens cuts, adopt two comparative approach, establishing Tb is the threshold value of detector lens sudden change, Ts is the threshold value of detector lens gradual change, detects the difference of consecutive frame, if difference is greater than Tb by difference metric method, belong to lens mutation, should carry out shot segmentation, if difference is less than Tb, be greater than Ts, likely belong to gradual shot, now need this frame and subsequent frame to compare, if frame difference is less than Ts, but between consecutive frame, difference cumulative sum is greater than Tb, means and really belongs to gradual shot;

2) while extracting key frame, first first the head and the tail frame of camera lens is set as to key frame, headed by cause, frame is conventionally in order to attract spectators to represent the theme of camera lens, tail frame wishes that spectators' aftertaste often represents with a kind of feature, after head and the tail frame is determined, also to choose the frame that is changed significantly as key frame, each frame that is about to non-key frame in camera lens compares with key frame successively, if differed greatly, using it as key frame, relatively go down successively until compared all non-key frames in camera lens, also to choose key frame according to the frame method of average afterwards, on certain assigned address, calculate the mean value of all frame pixel values, key frame is the frame that pixel value approaches mean value most,

3) during key frame feature extraction, the Visual Feature Retrieval Process color of still image, shape, texture, the variation of camera motion feature extraction camera motion, movement locus, moving target size, the video features that forms video represents;

4) videotext is described and carried out participle, statistics word frequency, forms its words-frequency feature and represents.

Further, step 2) the multinuclear Canonical Correlation Analysis described in, specifically:

1) training sample video sum is made as to n, with m dimensional vector X _ithe video features that represents i video, by the vectorial X of n m dimension _ibe merged into matrix X, represent video feature matrix, in like manner, with k dimensional vector Y _jthe words-frequency feature that represents the textual description that j video is corresponding, by the vectorial Y of n k dimension _jbe merged into matrix Y, represent words-frequency feature matrix;

2) utilize gaussian kernel function formula

Figure 2013104382161100002DEST_PATH_IMAGE001

calculate the Gram matrix K of X _x, K wherein _xfor the matrix of the capable n row of n, ‖ ‖ ²represent vectorial I ₂norm, real parameters σ represents the bandwidth of gaussian kernel, by different σ is set, can obtain one group of gaussian kernel with different nonlinear organizations, in like manner, to Y application 2-d polynomial kernel function formula

Figure 2013104382161100002DEST_PATH_IMAGE002

calculate the Gram matrix K of Y _y, wherein () ^trepresent vector or transpose of a matrix, parameters C is to be more than or equal to 0 kernel functional parameter, by different C is set, can obtain one group of polynomial kernel with different nonlinear organizations;

3) by the K carrying out after kernel function conversion _x, K _ybe normalized;

4) the objective function expression formula of the video retrieval method based on multinuclear canonical correlation analysis is:

\max_{α, β} \frac{W_{x}^{T} K_{x} K_{y} W_{x}}{\sqrt{(W_{x}^{T} K_{x}^{2} W_{x} + {κW}_{x}^{T} K_{x} W_{x}) \cdot (W_{y}^{T} K_{y}^{2} W_{y} + {κW}_{y}^{T} K_{y} W_{y})}},

subjectto (W_{x}^{T} K_{x}^{2} W_{x} + {κW}_{x}^{T} K_{x} W_{x}) = 1, (W_{y}^{T} K_{y}^{2} W_{y} + {κW}_{y}^{T} K_{y} W_{y}) = 1

，

Solve above-mentioned objective function expression equivalence in solving following generalized eigenvalue problem:

(K _x+κI) ^-1K _y(K _y+κI) ^-1K _xW _x＝λ ²W _x，

As required, get (K _x+ κ I) ^-1k _y(K _y+ κ I) ^-1k _xfront p maximum eigenwert characteristic of correspondence vector forms the matrix of the capable p row of n

Figure 2013104382161100002DEST_PATH_IMAGE005

as K _xcorresponding mapping matrix, p represents to adopt the dimension after the video retrieval method dimensionality reduction of multinuclear canonical correlation analysis, and κ is arithmetic number parameter, () ^-1inverting of representing matrix, I representation unit matrix, K _ycorresponding mapping matrix

Figure 2013104382161100002DEST_PATH_IMAGE006

can basis

Figure 2013104382161100002DEST_PATH_IMAGE007

with

Figure 2013104382161100002DEST_PATH_IMAGE008

between relation obtain:

W_{y}^{p} = \frac{{(K_{y} + κI)}^{- 1} K_{x} W_{x}^{p}}{λ},

Wherein,

Figure 2013104382161100002DEST_PATH_IMAGE010

also be the matrix of the capable p row of n;

5) according to new mapping matrix

Figure 2013104382161100002DEST_PATH_IMAGE011

structure K _x, K _yexpression under new space:

K_{x}^{new} = K_{x} W_{x}^{p}, K_{y}^{new} = K_{y} W_{y}^{p},

K_{x}^{new}, K_{y}^{new}

The matrix that is the capable p row of n, now both correlativitys are maximum.

Further, the search matching method described in step 3), specifically:

1) search key of submitting to according to user is converted into keyword words-frequency feature vector y _q, to y _qapplication laplace kernel function formula

Figure 2013104382161100002DEST_PATH_IMAGE014

calculate y _qn dimension Gram vector y _jfor j sample in training data Y, parameters C is to be more than or equal to 0 2-d polynomial kernel functional parameter;

2) according to mapping matrix

Figure 2013104382161100002DEST_PATH_IMAGE016

convert,

Figure 2013104382161100002DEST_PATH_IMAGE017

the low-dimensional that obtains keyword words-frequency feature represents, wherein

Figure 2013104382161100002DEST_PATH_IMAGE018

for p dimensional vector;

3) will

Figure 2013104382161100002DEST_PATH_IMAGE019

vector respectively with obtain before every a line of matrix, each sample carries out the calculating of cosine similarity successively,

each sample be p dimensional vector, cosine similarity shows that more greatly similarity is higher;

4) according to cosine similarity order from high to low, return to results for video.

The present invention proposes the video retrieval method based on multinuclear canonical correlation analysis, its advantage is: video is effectively retrieved, can be improved the precision of video frequency searching; Be applicable to all types of videos, without backstage manual operation, also can be used for helping domestic consumer to improve web page browsing quality.

Accompanying drawing explanation

Fig. 1 is method flow diagram of the present invention.

Embodiment

With reference to accompanying drawing, further illustrate the present invention:

A video retrieval method based on multinuclear canonical correlation analysis, the method comprises the following steps:

Processing video described in step 1) and the method for correspondence-textual description thereof, specifically:

Step 2) the multinuclear Canonical Correlation Analysis described in, specifically:

2) utilize gaussian kernel function formula

Figure 2013104382161100002DEST_PATH_IMAGE020

calculate the Gram matrix K of X _x, K wherein _xfor the matrix of the capable n row of n, ‖ ‖ ²represent vectorial I ₂norm, real parameters σ represents the bandwidth of gaussian kernel, by different σ is set, can obtain one group of gaussian kernel with different nonlinear organizations, in like manner, to Y application 2-d polynomial kernel function formula calculate the Gram matrix K of Y _y, wherein () ^trepresent vector or transpose of a matrix, parameters C is to be more than or equal to 0 kernel functional parameter, by different C is set, can obtain one group of polynomial kernel with different nonlinear organizations;

\max_{α, β} \frac{W_{x}^{T} K_{x} K_{y} W_{x}}{\sqrt{(W_{x}^{T} K_{x}^{2} W_{x} + {κW}_{x}^{T} K_{x} W_{x}) \cdot (W_{y}^{T} K_{y}^{2} W_{y} + {κW}_{y}^{T} K_{y} W_{y})}},

subjectto (W_{x}^{T} K_{x}^{2} W_{x} + {κW}_{x}^{T} K_{x} W_{x}) = 1, (W_{y}^{T} K_{y}^{2} W_{y} + {κW}_{y}^{T} K_{y} W_{y}) = 1

，

(K _x+κI) ^-1K _y(K _y+κI) ^-1K _xW _x＝λ ²W _x，

As required, get (K _x+ κ I) ^-1k _y(K _y+ κ I) ^-1k _xfront p maximum eigenwert characteristic of correspondence vector forms the matrix of the capable p row of n as

corresponding mapping matrix, p represents to adopt the dimension after the video retrieval method dimensionality reduction of multinuclear canonical correlation analysis, and κ is arithmetic number parameter, () ^-1inverting of representing matrix, I representation unit matrix, K _ycorresponding mapping matrix

can basis

with

between relation obtain:

W_{y}^{p} = \frac{{(K_{y} + κI)}^{- 1} K_{x} W_{x}^{p}}{λ},

Wherein,

also be the matrix of the capable p row of n;

5) according to new mapping matrix

structure K _x, K _yexpression under new space:

K_{x}^{new} = K_{x} W_{x}^{p}, K_{y}^{new} = K_{y} W_{y}^{p},

K_{x}^{new}, K_{y}^{new}

The matrix that is the capable p row of n, now both correlativitys are maximum.

Search matching method described in step 3), specifically:

1) search key of submitting to according to user is converted into keyword words-frequency feature vector y _q, to y _qapplication laplace kernel function formula calculate y _qn dimension Gram vector

y _jfor j sample in training data Y, parameters C is to be more than or equal to 0 2-d polynomial kernel functional parameter;

2) according to mapping matrix

convert, the low-dimensional that obtains keyword words-frequency feature represents, wherein

Figure 2013104382161100002DEST_PATH_IMAGE028

for p dimensional vector;

3) will vector respectively with obtain before

every a line of matrix, each sample carries out the calculating of cosine similarity successively,

Content described in this instructions embodiment is only enumerating the way of realization of inventive concept; protection scope of the present invention should not be regarded as only limiting to the concrete form that embodiment states, protection scope of the present invention is also and in those skilled in the art, according to the present invention, conceive the equivalent technologies means that can expect.

Claims

1. the video retrieval method based on multinuclear canonical correlation analysis, the method is characterized in that from internet captures video, for each video, carries out following operation:

2. the video retrieval method based on multinuclear canonical correlation analysis as claimed in claim 1, is characterized in that: the processing video described in described step 1) and the method for corresponding textual description thereof, specifically:

3. the video retrieval method based on multinuclear canonical correlation analysis as claimed in claim 2, is characterized in that: the multinuclear Canonical Correlation Analysis described step 2), specifically:

2) utilize gaussian kernel function formula

Figure 2013104382161100001DEST_PATH_IMAGE001

Figure 2013104382161100001DEST_PATH_IMAGE002

Figure 2013104382161100001DEST_PATH_IMAGE003

Figure 2013104382161100001DEST_PATH_IMAGE004

，

(K _x+κI) ^-1K _y(K _y+κ ₁) ^-1K _xW _x＝λ ²W _x，

can basis

with

between relation obtain:

Figure 2013104382161100001DEST_PATH_IMAGE005

Wherein,

also be the matrix of the capable p row of n;

5) according to new mapping matrix

Figure 2013104382161100001DEST_PATH_IMAGE006

structure K _x, K _yexpression under new space:

Figure 2013104382161100001DEST_PATH_IMAGE007

the matrix that is the capable p row of n, now both correlativitys are maximum.

4. the video retrieval method based on multinuclear canonical correlation analysis as claimed in claim 3, is characterized in that: the search matching method described in described step 3), specifically:

Figure 2013104382161100001DEST_PATH_IMAGE009

calculate y _qn dimension Gram vector

2) according to mapping matrix

convert, the low-dimensional that obtains keyword words-frequency feature represents, wherein for p dimensional vector;

3) will vector respectively with obtain before

Figure 2013104382161100001DEST_PATH_IMAGE013

Figure 2013104382161100001DEST_PATH_IMAGE014