CN103440496A

CN103440496A - Video memorability discrimination method based on functional magnetic resonance imaging

Info

Publication number: CN103440496A
Application number: CN2013103326130A
Authority: CN
Inventors: 韩军伟; 陈长远; 郭雷; 程塨
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2013-08-01
Filing date: 2013-08-01
Publication date: 2013-12-11
Anticipated expiration: 2033-08-01
Also published as: CN103440496B

Abstract

The invention relates to a video memorability discrimination method based on functional magnetic resonance imaging. A feature subspace model is built according to bottom-layer visual features of a few of video data in a video data base and cerebral function imaging space features corresponding to the video data, the bottom-layer virtual features of the video data without functional magnetic resonance imaging are mapped to the feature subspace to acquire the features of all the video data based on functional magnetic resonance imaging, a support vector regression is then trained, a video datum with unknown memorability value is given and the memorability value of the video datum is discriminated; compared with the traditional method that bottom-layer virtual features like colors, shapes and textures are utilized to carry out video memorability discrimination, the video memorability discrimination method based on functional magnetic resonance imaging greatly improves accuracy rate of video data memorability discrimination because the cerebral function imaging space features extracted from the functional magnetic resonance imaging data can be used for guide video memorability discrimination, and human brain cognation information is applied to video memorability discrimination.

Description

A kind of video Memorability method of discrimination based on functional mri

Technical field

The present invention relates to a kind of video Memorability decision method based on functional mri, can be applied to judge the Memorability numerical value of variety classes video.

Background technology

The lower-order questions of image/video data is the research direction that current digital image/field of video processing is new, and achievement in research seldom, and concentrates on the Memorability field of image, not yet has the research work of video Memorability to announce.

For the Memorability research of image, method is not a lot, and the certain methods existed at present is at first to extract the global characteristics of image (as SIFT, GIST, HOG etc.), by building sorter training pattern, then differentiates the Memorability of a given image.The Memorability of image has a lot of application.As editor can select the image easily remembered by the people front cover as magazine, the advertising man can select the image of easily being remembered as propagating poster etc.Therefore, when given piece image, if general-purpose computers automatically determine that can it be remembered by people will be highly significant.

And, for the lower-order questions of video, the method for discrimination of having announced is not yet arranged at present.The Memorability of video has application very widely, for example can be for the evaluation to video ads, after people have seen one section advertisement video, after a period of time, still can remember this section advertisement video, the value that this section advertisement video is described is very high, if otherwise after a period of time, people do not remember this section advertisement video, and the value that this section advertisement video is described is lower.Therefore the Memorability research of video had to very strong realistic meaning.

Functional mri is a kind of emerging neuroimaging mode, and its principle is to utilize the magnetic radiography that shakes to measure the hemodynamic change that neuron activity causes.Functional MRI can demonstrate the brain activating area while being subject to environmental stimuli.Functional magnetic resonance imaging can the reflection of Measurement accuracy human brain when the observation video data.When measuring, Magnetic resonance imaging is carried out in the zone of being in charge of the functions such as vision, the sense of hearing in human brain, each voxel in functional area can stimulate the generation response by corresponding extraneous video, for a certain section video data, just obtain the response data of all voxels on a period of time in each functional area by Functional magnetic resonance imaging, the feature that these fMRI datas comprise is referred to as the cerebral function imaging space characteristics.The neurosurgeon is verified, when eyes are watched as different types of video data content such as instrument, building and animals, can measure different functional magnetic resonance signals, existing research work both at home and abroad is used for the analysis to video data by the cerebral function imaging feature in Functional magnetic resonance imaging, as video frequency searching, the problems such as visual classification.But because functional mri experiment cost is comparatively expensive, often can only obtain the corresponding fMRI data of a small amount of video, how to utilize a small amount of fMRI data to solve a large amount of video analysis problems, also have very important significance.

Summary of the invention

The technical matters solved

In order to solve the Memorability discrimination of video, the human brain cognitive information is applied in video Memorability discrimination simultaneously, improve the accuracy that the video Memorability is differentiated, the present invention proposes a kind of video Memorability method of discrimination based on functional mri.

Technical scheme

A kind of video Memorability method of discrimination based on functional mri is characterized in that step is as follows:

Step 1, extract the bottom visual signature of all video datas:

Step a: comprise altogether the individual video data of N ∈ [100,1000] in video database, to each video data, extract the key frame of the first frame of its each second as video data;

Step b: utilize the object bank routine package of Li-Jia Li issue in 2010, to each key frame, utilize the down-sampling technology to obtain 12 scalogram pictures of input picture, and 208 object templates in these 12 scalogram pictures and object bank program are carried out to convolutional calculation, make each key frame obtain 208 * 12 width response images;

Recycle two interpolation methods, the response image interpolation by each key frame corresponding to 12 yardsticks of each template, obtain the response image of same size; Calculate the maximal value of each pixel on the response image of 12 same scale, form a peak response image;

Then ask the pixel average of peak response image, obtain the proper vector of one 208 dimension of each key frame, the feature of 208 dimensions of all key frames of each video data is done on average, obtain the proper vector of 208 dimensions of each video data, be referred to as the object bank feature of video data;

Step c: at first the key frame of each video data is transformed into to the Lab color space, 2 layers of wavelet transformation are done in image sublayer in each color characteristic subspace, obtain each subband wavelet coefficient of each key frame, the wavelet coefficient obtained is done to L ∈ [1,256] the level histogram, utilize formula E=-Σ _ip _ilog (p _i) calculate the entropy of each subband; P wherein _i, i=0,1,2 ..., L-1 is histogrammic probability distribution;

By the entropy addition of all subbands, obtain respectively the entropy E of each color characteristic subspace _l, E _a, E _b; Then utilize formula S C=0.84 * E _l+ 0.08 * E _a+ 0.08 * E _bobtain the scene complexity feature of key frame; The scene complexity feature of all key frames of each video data is average, obtain the scene complexity feature of each video data;

Steps d: at first by the key frame of each video data, it converts gray level image to, and gray level image is done to histogram, and wherein differentiable number of grayscale levels is L ∈ [1,256]; Make p (z _i), i=0,1,2 ..., L-1 is corresponding histogram, wherein: z is a stochastic variable that represents gray level;

Utilize formula

calculate the n rank moment characteristics of gray level image, wherein m is the average gray level of z;

make n=2 and n=3 obtain respectively second moment feature and the third moment feature of gray level image; Make σ ²(z) be the second moment of gray level image, utilize formula

obtain the smoothness feature of gray level image;

Utilize formula

calculate the consistance feature of gray level image;

Utilize formula

e = - Σ_{i = 0}^{L - 1} p (z_{i}) \log_{2} p (z_{i})

Calculate the mean entropy feature of gray level image;

By the second moment feature of gray level image, the third moment feature, the smoothness feature, the consistance feature, the mean entropy feature is connected in series, and obtains the textural characteristics of key frame.The textural characteristics of all key frames of each video data is done on average, obtained the textural characteristics of each video data;

Step e: at first extract the sound signal of each video data, then utilize Mel frequency cepstral coefficient algorithm, extract the MFCC feature of each video data;

Step f: by the Object bank feature of each video data, the scene complexity feature, textural characteristics and MFCC feature are cascaded from beginning to end, obtain the bottom visual signature of each video data;

Step 2, extract the cerebral function imaging space characteristics of video data from fMRI data: utilize the method that the patent No. is 201110114991 to extract N in video database _fthe individual cerebral function imaging space characteristics with video data of fMRI data of ∈ [10,100]; The cerebral function imaging space characteristics obtained is carried out to dimensionality reduction with variance analysis, and the parameter of getting variance analysis is

and then the feature selection approach BLOGREG algorithm that utilizes Gavin to propose carries out further dimensionality reduction to the cerebral function imaging space characteristics.Obtain the cerebral function imaging spatial eigenmatrix f of video data _b;

Step 3, construction feature subspace model:

Step a: to N _fthe cerebral function imaging space characteristics f of the individual video data of ∈ [10,100] _bmake principal component analysis (PCA) PCA, making threshold value is T ∈ [0.95,0.99], obtains removing the feature f of redundant information _bpca, by N _fthe feature f of individual video _bpcaby rows, obtain N _fthe eigenmatrix F of individual video ₁;

F_{1} = {[\begin{matrix} f_{Bpcal} \\ \cdot \cdot \cdot \\ f_{Bpcai} \\ \cdot \cdot \cdot \\ f_{Bpca N_{f}} \end{matrix}]}_{N_{f} \times n_{B}}

Wherein: i ∈ [1, N _f] expression N _fi video data in the individual video data with corresponding function magnetic resonance imaging data, n _bdimension for feature;

Step b: to N _fthe bottom visual signature f of the individual video data of ∈ [10,100] _lbe principal component analysis (PCA) PCA, making threshold value is T ∈ [0.95,0.99], obtains removing the feature f of redundant information _lpca; By N _fthe feature f of individual video _lpcaby rows, obtain N _fthe eigenmatrix F of individual video ₂:

F_{2} = {[\begin{matrix} f_{Lpcal} \\ \cdot \cdot \cdot \\ f_{Lpcai} \\ \cdot \cdot \cdot \\ f_{Lpca N_{f}} \end{matrix}]}_{N_{f} \times n_{L}}

Wherein: i ∈ [1, N _f] expression N _fi video data in the individual video data with corresponding function magnetic resonance imaging data, n _ldimension for feature;

Step c: utilize canonical correlation analysis CCA algorithm, obtain the mapping matrix of proper subspace, the mapping relations of primitive character space and proper subspace are

\{\begin{matrix} U = F_{1} \times A \\ V = F_{2} \times B \end{matrix},

Wherein: F ₁, F ₂for eigenmatrix, A, B is the proper subspace mapping matrix, U, V is the mapping of eigenmatrix on proper subspace;

Step 4, the individual video data of all N ∈ [100,1000] in video database is mapped in proper subspace: by the bottom visual signature of the individual video data of all N ∈ [100,1000] in video database by rows, obtain bottom visual signature matrix F _n, to F _nbe principal component analysis (PCA) PCA, making threshold value is T ∈ [0.95,0.99], obtains removing the eigenmatrix F of redundant information _npca;

Utilize formula V _n=F _npca* B, by eigenmatrix F _npcabe mapped in proper subspace, obtain the eigenmatrix V of proper subspace _n;

Step 5, the training of video Memorability forecast model: utilize proper subspace eigenmatrix V _n, and mark is tested the video Memorability true value obtained, training support vector regression SVR model;

The Memorability value of step 6, prediction video: for the video data of any one Memorability numerical value the unknown, at first utilize the method for step 1 to extract the bottom visual signature, then with step 3 in the proper subspace mapping matrix B that obtains of step c multiply each other, proper subspace feature after being shone upon, this proper subspace feature is input in support vector regression (SVR) model that step 5 trains, obtains the Memorability numerical value of this video data.

Beneficial effect

A kind of video Memorability method of discrimination based on functional mri that the present invention proposes, utilize bottom visual signature and the cerebral function imaging space characteristics construction feature subspace model corresponding to these video datas of a small amount of video data in video database, the bottom visual signature that will not carry out the video data of functional mri is mapped in proper subspace, obtain the feature based on functional mri of all video datas, then train support vector regression, the video data of given Memorability value the unknown, judge the memory value that obtains this video data.

The present invention can instruct the video Memorability to judge with the cerebral function imaging space characteristics extracted in fMRI data, the human brain cognitive information is applied in the judgement of video Memorability, compare as the method that color, shape, texture etc. carry out the judgement of video Memorability with traditional bottom visual signature that utilizes, greatly improved the accuracy rate that the video data Memorability is judged.

The accompanying drawing explanation

Fig. 1: the basic flow sheet of the inventive method

Embodiment

Now in conjunction with the embodiments, the invention will be further described for accompanying drawing:

For the hardware environment of implementing, be: AMD Athlon64 * 25000+ computing machine, 2GB internal memory, 256M video card, the software environment of operation is: Matlab2009a and Windows XP.We have realized with Matlab software the method that the present invention proposes.

The present invention specifically is implemented as follows:

1: the bottom visual feature vector that extracts N video data:

Step a: extract the key frame of the first frame of each video data each second as video data.

Step b: utilize the object bank routine package of Li-Jia Li issue in 2010, to each key frame, utilize the down-sampling technology to obtain 12 scalogram pictures of input picture, and 208 object templates in these 12 scalogram pictures and object bank program are carried out to convolutional calculation, make each key frame obtain 208 * 12 width response images; Utilize two interpolation methods, the response image interpolation by each key frame corresponding to 12 yardsticks of each template, obtain the response image of same size; Calculate the maximal value of each pixel on the response image of 12 same scale, form a peak response image; Then ask the pixel average of peak response image, obtain the proper vector of one 208 dimension of each key frame, the feature of 208 dimensions of all key frames of each video data is done on average, obtain the proper vector of 208 dimensions of each video data, be referred to as the object bank feature of video data.The object bank program that described Li-Jia Li announced in 2010 is shown in paper: Li-Jia L, Hao S, Eric X, et al.Object bank:A high-level image representation for scene classification and semantic feature sparsification[C] .NIPS, 2010.

Step c: to the key frame of each video data, at first key frame is transformed into to the Lab color space, 2 layers of wavelet transformation are done in image sublayer in each color characteristic subspace, obtain each subband wavelet coefficient of each key frame, the wavelet coefficient obtained is done to L=256 level histogram and mean, utilize formula E=-Σ _ip _ilog (p _i) calculate the entropy of each subband, wherein p _i, i=0,1,2 ..., L-1 is histogrammic probability distribution.By the entropy addition of all subbands, obtain respectively the entropy E of each color characteristic subspace _l, E _a, E _b.Then utilize formula S C=0.84 * E _l+ 0.08 * E _a+ 0.08 * E _bobtain the scene complexity feature of key frame. the scene complexity feature of all key frames of each video data is average, obtain the scene complexity feature of each video data.

Steps d: to the key frame of each video data, at first convert thereof into gray level image, gray level image is done to histogram, wherein differentiable number of grayscale levels is L=256.Make p (z _i), i=0,1,2 ..., L-1 is corresponding histogram, wherein z is a stochastic variable that represents gray level.Utilize formula

calculate the n rank moment characteristics of gray level image, wherein m is the average (average gray level) of z,

make n=2 and n=3 obtain respectively second moment feature and the third moment feature of gray level image.Make σ ²(z) be the second moment of gray level image, utilize formula obtain the smoothness feature of gray level image.Utilize formula

calculate the consistance feature of gray level image.Utilize formula

calculate the mean entropy feature of gray level image.By the second moment feature of gray level image, the third moment feature, the smoothness feature, the consistance feature, the mean entropy feature is connected in series, and obtains the textural characteristics of key frame.The textural characteristics of all key frames of each video data is done on average, obtained the textural characteristics of each video data.

Step e: at first extract the sound signal of each video data, then utilize Mel frequency cepstral coefficient algorithm, extract the MFCC feature of each video data.

Step f: by the Object bank feature of above-mentioned each video data, the scene complexity feature, textural characteristics, the MFCC feature is cascaded from beginning to end, obtains the bottom visual signature of each video data.

2: the cerebral function imaging space characteristics that extracts video data from fMRI data: utilize the patent that the patent No. is 201110114991 to extract N in video database _fthe individual cerebral function imaging space characteristics with video data of fMRI data of ∈ [10,100].The cerebral function imaging space characteristics obtained is carried out to dimensionality reduction with variance analysis, and the parameter of getting variance analysis is

and then the feature selection approach BLOGREG algorithm that utilizes Gavin to propose carries out further dimensionality reduction to the cerebral function imaging space characteristics.Obtain the cerebral function imaging spatial eigenmatrix f of video data _b.The patent No. of described patent is 201110114991; The feature selection approach BLOGREG algorithm that described Cawley proposes is shown in paper: Cawley G C, Talbot N L C.Gene selection in cancer classification using sparse logistic regression with Bayesian regularization[J] .Bioinformatics, 2006,22 (19): 2348-2355.

3, construction feature subspace model: for the N that there is the cerebral function imaging space characteristics in video database _findividual video data, construction feature subspace model then, step is as follows:

Step a: to N _fthe cerebral function imaging space characteristics f of the individual video data of ∈ [10,100] _bdo principal component analysis (PCA) (PCA), making threshold value is T=0.98, obtains removing the feature f of redundant information _bpca, by N _fthe feature f of individual video _bpcaby rows,

Obtain N _fthe eigenmatrix F of individual video ₁.

F_{1} = {[\begin{matrix} f_{Bpcal} \\ \cdot \cdot \cdot \\ f_{Bpcai} \\ \cdot \cdot \cdot \\ f_{Bpca N_{f}} \end{matrix}]}_{N_{f} \times n_{B}}

I ∈ [1, N wherein _f] expression N _fi video data in the individual video data with corresponding function magnetic resonance imaging data, n _bdimension for feature.

Step b: to N _fthe bottom visual signature f of the individual video data of ∈ [10,100] _ldo principal component analysis (PCA) (PCA), making threshold value is T=0.98, obtains removing the feature f of redundant information _lpca.By N _fthe feature f of individual video _lpcaby rows, obtain N _fthe eigenmatrix F of individual video ₂.

F_{2} = {[\begin{matrix} f_{Lpcal} \\ \cdot \cdot \cdot \\ f_{Lpcai} \\ \cdot \cdot \cdot \\ f_{Lpca N_{f}} \end{matrix}]}_{N_{f} \times n_{L}}

I ∈ [1, N wherein _f] expression N _fi video data in the individual video data with corresponding function magnetic resonance imaging data, n _ldimension for feature.

Step c: utilize canonical correlation analysis (CCA) algorithm, obtain the mapping matrix of proper subspace.The mapping relations of primitive character space and proper subspace are

\{\begin{matrix} U = F_{1} \times A \\ V = F_{2} \times B \end{matrix},

F wherein ₁, F ₂for eigenmatrix, A, B is the proper subspace mapping matrix, U, V is the mapping of eigenmatrix on proper subspace.

4, construction feature subspace model: all N=222 video data in video database is mapped in proper subspace: by the bottom visual signature of all N=222 video data in video database by rows, obtain bottom visual signature matrix F _n, to F _ndo principal component analysis (PCA) (PCA), making threshold value is T=0.98, obtains removing the eigenmatrix F of redundant information _npca.Utilize formula V _n=F _npca* B, by eigenmatrix F _npcabe mapped in proper subspace, obtain the eigenmatrix V of proper subspace _n.

5, video Memorability forecast model training: utilize proper subspace eigenmatrix V _n, and mark is tested the video Memorability true value obtained, training support vector regression (SVR) model.Here realize support vector regression (SVR) model with the libSVM software package, parameter is set to '-s3-p0.06-t2-c50 '.It is as follows that mark is tested the experimental technique that obtains video Memorability true value:

Step 1, experiment sequence are made: N=222 video data is divided into to 44 groups, and wherein first 42 groups respectively comprise 5 video datas, and latter 2 groups respectively comprise 6 video datas, claim that this N=222 video data is the target video data storehouse.Separately get 2196 interference video data.Get 5 target video data of first group, and 25 interference video data, random these 30 video datas of arranged in series, and interval 2 seconds between every two video datas, obtain first group of experiment sequence a; Get 5 target video data of first group, and 25 original interference video data, random these 30 video datas of arranged in series, and interval 2 seconds between every two video datas, obtain first group of experiment sequence b.In like manner make 43 groups of remaining experiment sequences, just, for last two groups of experiment sequences, target video data used is 6, and the interference video data are 24.

Step 2, target video data mark: 20 experimental subjectss mark experiment to the data of making in step 1, for each experimental subjects, at first watch first group of sequence a in video sequence, interval is after the time of 2 days, watch first group of sequence b in the group video sequence, whether the video recorded out in sequence b occurred in a, if occurred in a, recorded the name of video.Behind 2 days, interval, continue to do second group of experiment in video sequence, until complete all experiment sequences.

Step 3, video Memorability true value are calculated: add up the data result of all 20 experimental subjectss, the computing formula of video Memorability true value is:

the span of video Memorability true value is [0%, 100%].

6, the Memorability value of prediction video: for the video data of Memorability numerical value the unknown, at first extract its bottom visual signature, then with the proper subspace mapping matrix, multiply each other, proper subspace feature after being shone upon, this proper subspace feature is input in support vector regression (SVR) model trained, judges the Memorability numerical value that obtains this video data.

7, the calculating of relative coefficient: in order to verify the effect of the inventive method, the relative coefficient of computational discrimination value and actual value.To 222 video datas, adopt leaving-one method, choose 221 video datas as training set at every turn, choose 1 video data as test set, judge the Memorability numerical value of this video data, circulate 222 times.Calculate the predicted value of all 222 video datas and the related coefficient between its actual value, the computing formula of related coefficient is as follows:

ρ = \frac{Σ_{i = 1}^{N} (X_{i} - \overset{&OverBar;}{X}) (Y_{i} - \overset{&OverBar;}{Y})}{\sqrt{Σ_{i = 1}^{N} {(X_{i} - \overset{&OverBar;}{X})}^{2}} \sqrt{Σ_{i = 1}^{N} {(Y_{i} - \overset{&OverBar;}{Y})}^{2}}}

X wherein _ithe Memorability true value that means test video, Y _ithe Memorability decision content that means test video, the mean value that means the Memorability true value of 222 videos,

mean the mean value of the Memorability decision content of 222 videos, N=222 means to have 222 video datas, and ρ means related coefficient.The Memorability that simultaneously adopts the bottom visual feature vector to carry out same video is judged experiment, calculates the predicted value of all 222 video datas and the related coefficient between its actual value.Table 1 has shown the relative coefficient of two kinds of methods, and related coefficient is larger, shows that decision content more approaches actual value, and that judges is more accurate.The result demonstration, the video Memorability method of discrimination based on functional mri of this paper, compare than the video Memorability method of discrimination of bottom visual signature, and relative coefficient exceeds 7% left and right, and it is more accurate to predict.

The comparison of table 1 relative coefficient

Related coefficient	The bottom visual signature	The inventive method
			ρ	0.50	0.57

Claims

1. the video Memorability method of discrimination based on functional mri is characterized in that step is as follows:

Step 1, extract the bottom visual signature of all video datas:

Utilize formula

obtain the smoothness feature of gray level image;

Utilize formula

calculate the consistance feature of gray level image;

Utilize formula

calculate the mean entropy feature of gray level image;

Step 2, extract the cerebral function imaging space characteristics of video data from fMRI data: utilize the method that the patent No. is 201110114991 to extract N in video database _fthe individual cerebral function imaging space characteristics with video data of fMRI data of ∈ [10,100]; The cerebral function imaging space characteristics obtained is carried out to dimensionality reduction with variance analysis, and the parameter of getting variance analysis is and then the feature selection approach BLOGREG algorithm that utilizes Gavin to propose carries out further dimensionality reduction to the cerebral function imaging space characteristics.Obtain the cerebral function imaging spatial eigenmatrix f of video data _b;

Step 3, construction feature subspace model: