CN104268546A

CN104268546A - Dynamic scene classification method based on topic model

Info

Publication number: CN104268546A
Application number: CN201410229426.4A
Authority: CN
Inventors: 刘纯平; 林卉; 陈宁强; 吴扬; 季怡; 龚声蓉
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2014-05-28
Filing date: 2014-05-28
Publication date: 2015-01-07

Abstract

The dynamic scene classification method based on topic model that the invention discloses a kind of, it is characterized in that, the following steps are included: (1) carries out partial descriptions to image using SIFT feature, generate the corresponding SIFT characteristic pattern of original image, by the variation of time, the variation on relative position is just had between the corresponding feature of original image, this variation constitutes flow field, forms dynamic video SIFT stream; (2) uniform piecemeal is carried out to dynamic video SIFT flow field image, be divided into Grid is quantified as the histogram of 8 handles to each piecemeal by the direction that SIFT flows, and forms 72 D feature vectors, is formed as vision word using K-mean cluster; (3) it introduces word prior information and extends original TMBP model, and original TMBP model and Konwledge-TMBP model are modeled using the vision word after quantization, obtain the result of scene classification. The multidate information in dynamic scene, which is described, using SIFT stream information generates vision word, and consider the vision word problem whether significant to expression theme, the weight of vision word is added in the reasoning of topic model, to achieve the purpose that the classification speed and precision that improve dynamic scene.

Description

A kind of dynamic scene sorting technique based on topic model

Technical field

The present invention relates to a kind of video processing technique, be specifically related to a kind of dynamic scene sorting technique based on topic model.

Background technology

Along with the development of science and technology, the scale of view data becomes increasing, and digital picture becomes the important media of Information Communication especially, particularly dynamic digital picture.How carrying out Classification Management to dynamic image rapidly, this classifies to dynamic scene with regard to needs.Multitude of video information on internet is also more and more higher to the requirement of the intelligent management of these data.By the automatic classification to video scene, contribute to people when searching oneself interested digital video content, can accurately, quick position.Such as, obtain the video of certain forest fire if want, we first can utilize scene classification, first classify to the video that can search, and then in the middle of the video segment of this class scene of forest fire, find specific target object, to reach the object of search.

So-called scene classification, namely classify to image according to the semantic content of image scene, this not only contains people to piece image understanding generally, and also provides in image the contextual information occurring target.Dynamic scene classification refers to that, in a large amount of video libraries, the semantic content according to video is classified to video, and then provides strong basis for the research of image retrieval, target identification.

From above-mentioned example, classification to dynamic scene is depended on to the retrieval of dynamic image in video, so be basic work to the classification of dynamic scene, therefore dynamic scene classification is also image understanding and one of research contents of basis and key the most in machine vision.

At present, the classification of the scene of intelligence mainly contains following several application:

1, indirect labor's mark.If a large amount of digital pictures, video data can be automatically divided into different scene type, the work of artificial mark just obtains and simplifies greatly.As long as supervisor, by observing the different images scene type sectional drawing after automatic classification, just clearly can understand the scene type that anomalous event easily occurs, thus can pay close attention to this specific scene type more.Artificial workload can not only be reduced like this, also can improve the accuracy rate of monitoring.Such as; at traffic cross-road or this kind of stream of people of highway is many, vehicle is many and in the scene of complexity; the stream of people, wagon flow gather suddenly or deployment conditions is a kind of special scene type; they advance or other natural scenes relative to the normal unidirectional stream of people, and the probability that anomalous event occurs is much higher.Therefore, by intelligence to after image, video scene classification, monitor staff can more be absorbed in the Digital Image Data of this type of easy generation anomalous event.

2, administering digital view data.The swift and violent growth of digital image video data, also more and more higher to the requirement of the intelligent management of these data in recent years.By the automatic classification to image, video scene, contribute to people when searching oneself interested digital video content, can accurately, quick position.Meanwhile, by the scene fragment mark label giving same digital video source different, can be more convenient in search afterwards and review process, the fixing event of searching in video sequence also can become simpler.By classifying to the scene type of digital video, thus realize the Classification Management to these different video files.

3, for deeper digital picture, video analysis provide support.At computer vision field, scene classification is the most basic in numerous intelligent image, video analysis algorithm and a simple step, in order to classify to image, video scene, often need to extract and the different characteristic analyzed wherein, these features are useful often for follow-up video analysis.Scene classification is also for the Computer Vision Task of target recognition and tracking, behavioral value, video understanding and so on provides effective relevant information.Such as, when there being a clear and definite searched targets, first can utilize scene classification, first classify to image or video, then in the middle of the image or video segment of Same Scene, find specific target object, this just makes target retrieval become simplification.If user does not have clear and definite searched targets, need according to condition find certain class image, video result time, scene classification just seems even more important, and user can browse all images or the video of this scene, and then selects the result needing searching wherein.

4, for other computer research directions such as artificial intelligence provide support.The training of such as robot vision, the intelligent walking of robot needs to install a pair of " eyes " to it.If can identify image scene fast, be just equivalent to the environmental information knowledge of outside to be supplied to robot, it just can utilize the external information of acquisition to respond fast, and automatically finishes the work.

Except these application aspect above, scene classification is being concerned about by more and more people and is being applied to the various aspects of industrial design.By the continuous research to scene classification, people just can obtain larger progress in the field of intellectual analysis, and this just scene understand and the research purpose place of analytical algorithm.

At present, the research method for dynamic scene classification mainly can be divided into two classes.The first kind is traditional dynamic scene sorting technique based on tracking.The basic thought of these class methods follows the tracks of the moving object in dynamic scene, obtains its movement locus, by realizing the target of dynamic scene classification to the analysis of movement locus.First, carry out object detecting and tracking to video, the result of detection triggers follows the tracks of, then according to the track compulsory test followed the tracks of, As time goes on, effectively upgrades trace, to improve the result of detection.Then by realizing the classification of dynamic scene to the analysis of movement locus.But, when the object moved in dynamic scene is many, need the aim parameter followed the tracks of rapidly to increase, computation complexity also straight line rises, and the situation that can there is overlap between moving target, block, Detection results at this moment and the effect of tracking will be poor.

For first kind method Problems existing, researcher is just had to propose the method for Equations of The Second Kind dynamic scene classification, i.e. the dynamic scene sorting algorithm of feature based extraction.Feature extraction tactful in, be divided into again two levels, utilize the scene classification of Low Level Vision feature and utilize the scene classification of middle level semanteme.Utilize the scene classification of Low Level Vision feature first can extract low-level image feature in dynamic scene, as: then these features being combined with there being the training method of supervision, realizing the classification of scene by color, texture and shape etc.The low-level feature extracting scene is highly effective for the classification of simple scenario, but when scene comparatively complexity time, the effect of classification is unsatisfactory.Utilize the scene classification of middle level semanteme to be carry out semantic modeling to scene, fill up the wide gap between image low-level feature and high-level semantic, thus solve the problem of scene classification.Generally speaking, the method for Equations of The Second Kind dynamic scene classification is after extraction scene characteristic, and using the feature after quantification as the input of probability statistics model, complete the classification of dynamic scene, conventional probability statistics model has LDA, HDP etc.

Topic model (topic models) is a kind of statistical model analyzing large-scale data, and in recent years, topic model is widely used in text-processing field.In this field, topic model modeling process training data is regarded as the potpourri comprising each theme, utilizes topic model to simulate the generative process of document, then obtained the theme of each document by parameter estimation.When predicting unknown data, topic model extracts semantic relevant theme set by the word frequency number of word co-occurrence in a document, by the document subject feature vector in word space to theme space, learns to express to test document collection at lower dimensional space.Conventional topic model has PLSA (Probabilistic Latent Semantic Analysis, PLSA), LDA (Latent Dirichelet Allocation, LDA), TMBP (Topic Model of Belief Propagation, TMBP) etc.Image and text similar, be also the descriptions of people to objective world, and relative to text, the description of image is more vivid, concrete, topic model is also incorporated into the analysis of image and the field of understanding by many researchers.2005, LDA model is incorporated into the field of the subject classification of image by the people such as Fei-fei Li, extract gradation of image feature and SIFT feature two kinds of method Description Images, visual signature cluster is corresponding vision word by recycling K-means clustering algorithm, complete the corresponding of word and image, finally utilize the LDA model in text analyzing to find the potential applications of image, thus complete the scene classification of still image.2008, the people such as Bosch utilized latent semantic analysis (PLSA) model, and with scale invariant feature SIFT Description Image, utilize this feature to generate vision word dictionary, the semantic content of analysis chart picture, achieves the scene classification of still image.

Objective world is ubiquitous is the dynamic natural scene of complexity that complex background and image-forming condition dynamically change.As wave leaf, dense population, flock of birds, flowing water, wave, snow, rain and the environment such as smog.Dynamic scene, compared with static scene, comprises more multidate information and time sequence information, directly can not utilize the sorting technique of topic model in static scene.In topic model reasoning, traditional theme model (PLSA, LDA) is not distinguished the importance of vision word in dynamic scene classification, does not consider that vision word is to the whether significant problem of expression theme.Meanwhile, traditional theme model, nicety of grading longer for the view data training time also needs to be improved further.Although current numerous scholar both domestic and external has proposed a lot of novel method, but algorithm often has very strong specific aim and limitation, and accuracy and speed still have much room for improvement, so the classification of dynamic scene remains an immature research field, many problems are also had to have to be solved.

Summary of the invention

The present invention seeks to: the problem longer for the view data training time in the process of classifying at dynamic scene for traditional theme model, nicety of grading is not high, the multidate information utilizing SIFT stream information to describe in dynamic scene generates vision word, and consider that vision word is to the whether significant problem of expression theme, in the reasoning of topic model, add the weight of vision word, thus reach the raising classification speed of dynamic scene and the object of precision.

Technical scheme of the present invention is: a kind of dynamic scene sorting technique based on topic model, is characterized in that, comprise the following steps:

(1) SIFT feature is utilized to carry out partial descriptions to image, generate the SIFT feature figure that original image is corresponding, the change in elapsed time, just has the change on relative position between original image characteristic of correspondence, this change constitutes flow field, defines dynamic video SIFT and flows;

(2) characteristic quantification is flowed to dynamic video SIFT, and cluster forms vision word;

(3) utilize the vision word modeling after quantizing, obtain the result of scene classification.

Preferably, the mode of the SIFT stream that the present invention adopts extracts the feature of dynamic image, and it is quantified as vision word by the size and Orientation of SIFT stream.By to the partial descriptions of SIFT feature utilizing image, generate the SIFT feature figure that original image is corresponding, the change in elapsed time, just have the change on relative position between characteristic of correspondence, this change constitutes flow field, forms SIFT stream.The basic step extracting dynamic video SIFT stream is as follows:

1, be image sequence by Video processing.

For the video of input, the sequence of pictures of single frames first to be treated to.Being to reduce data volume on the one hand, facilitating the calculating in later stage; On the other hand, if difference is less between two frames, motion is just too small, movable information just likely extract less than, so choosing of key frame also contributes to extracting movable information between two frames.The method choosing key frame has a lot, common are directly in video time sequence fixed intervals n frame choose the way of a frame, also have other adaptive extraction method of key frame etc.By key-frame extraction, the video of input just changes in order to seasonal effect in time series image.

2, design of graphics describes as dense SIFT feature

SIFT descriptor is a sparse feature interpretation, and it comprises extraction and the detection of feature.Only use the extraction of feature here.The extraction of dense SIFT feature is mainly divided into two steps:

(1) single pixel derivative value and derivative direction is calculated.

To each pixel in image, calculate its derivative value m and derivative direction θ according to formula (1) and (2).

m (x, y) = \sqrt{{dx}^{2} + {dy}^{2}} - - - (1)

θ (x, y) = \tan^{- 1} \frac{dy}{dx} - - - (2)

Wherein, dx, dy are for each pixel is at the derivative in x direction and y direction respectively.

(2) neighborhood of statistics point pixel, forms histogram.

For each some pixel, get around it n × nsize is a neighborhood, and general n gets 8; Again this neighborhood is evenly divided into m × mindividual block, general m gets 4.To the histogram of first derivative HPLC of the internal calculation weighting of each piece, be defined as an interval wherein for every 45 degree of direction, be quantified as 8 intervals.Its ultimate principle as shown in Figure 1.Each like this descriptor just defines the proper vector of one 128 dimension, for every piece image, is obtained for their dense SIFT and describes.

3, SIFT feature is mated

Calculate the calculating basic simlarity of objective function and the light stream of the SIFT stream of two width adjacent images, it has two elementary objects, and one is the flow direction that the coupling of SIFT descriptor will meet motion, and two is that flow field is level and smooth as far as possible, move discontinuous in the edge of object.According to these two targets, definition p=(x, y) is certain point in image, and its Neighbourhood set is denoted as ε.Its motion vector in two adjacent images is w (p)=(u (p), v (p)), conveniently calculate u (p) and be defined as two integers with v (p), two SIFT feature figure to be matched are designated as s respectively ₁and s ₂.To consider following three factors in calculating two Feature Points Matching: first will from s ₁(p) and s ₂p the similarity degree of () mates; Secondly, according to actual conditions, flow velocity u (p) is unsuitable excessive with v (p), adds the continuity of actual motion, should the close region of priority match pixel.So the energy function of SIFT stream is defined as follows:

\begin{matrix} E (w) = \underset{p}{Σ} \min {| | s_{1} (p) - s_{2} (q) | |}_{t} + \underset{p}{Σ} η (| u (p) | + | v (p) |) \\ + \underset{p, q &Element; ϵ}{Σ} \min ((α | u (p) - u (q) |, d) + (α | u (p) - u (q) |, d)) \end{matrix} - - - (3)

Wherein, α and η is its parameter, and d is the restriction threshold values of motion.Three on the right of equation respectively representative feature Point matching time three factors considering.

4, the sports ground of SIFT stream calculates

If energy theorem direct solution above, its counting yield is not high, therefore proposes optimizing process from coarse to fine, improves the counting yield of algorithm.Need before matching to set up SIFT pyramid { S (k) }, wherein S (1)=S, S (k+1) is obtained through level and smooth and down-sampling by its last layer S (k), at kth layer, suppose that Pk is pixel to be matched, Ck is the window center of search, and W (Pk) is best match point.As shown in Figure 2, first from the pyramidal top layer of SIFT, the size of search window is the strategy of the search optimized m × m(m is the width of most top layer images), mates image is top-down.In fig. 2, pyramidal most top layer is S (3), the center of search window, in p (3), when searching for lower one deck, only needs to carry out search to the part corresponding to the flow vector W (Pk) of the optimum of this one deck just passable.The rectangle frame representative search window in each layer of figure Green, if for the image of same yardstick with flow field is set up, so at S according to pixel ^(k+1)when yardstick is searched for, as long as calculate the W (Pk) in the field of S (k) yardstick match point p, so just greatly accelerate calculating.

5, SIFT flows characteristic quantification

After carrying out characteristic matching, obtain motion u (p) of horizontal direction of the SIFT feature point that correspondence matches and motion v (p) of vertical direction, i.e. the SIFT stream information of each pixel position.The SIFT arrived for two adjacent key-frame extraction flows feature, and need to quantize, the present invention adopts histogrammic method.SIFT stream comprises the information of two aspects: motion u (p) of horizontal direction and motion v (p) of vertical direction.Its deflection θ=tan can be obtained by [u (p), v (p)] ^-1(v (p)/u (p)), the deflection of SIFT stream can be divided into 8 modules, and as shown in Figure 3, in 0 ° to 360 °, every 45 ° are divided into a module, by the size of SIFT flow vector be incorporated to one of them module, finally by this histogram normalization, will ensure that histogrammic sign has scale invariability.

Because dynamic scene not only has the information of motion, also contains specific positional information in picture frame originally, in the mode of uniform grid, 9 windows are divided into for the characteristic pattern extracted, as shown in Figure 4.Picture in picture picture frame divides in order to 3 × 3 windows equably, flows histogrammic generating mode again form histogram in each window by SIFT, and finally each width SIFT flow graph is all by the feature interpretation of formation one 3 × 3 × 8 dimension, i.e. the vector of 72 dimensions.The SIFT that Fig. 5 gives dynamic scene flows histogram.

6, cluster generates vision word

After quantizing characteristic pattern, can use clustering algorithm that its cluster is obtained corresponding visual dictionary and word bag.Present invention employs the SIFT stream feature that K-means algorithm cluster training video is concentrated, the specific algorithm flow process of K-means is as follows:

(1) from n data Stochastic choice k according to as initial cluster centre;

(2) according to the average (center object) of each clustering object, the distance of each object and these center object is calculated; And again corresponding object is divided according to minor increment;

(3) average (center object) of each cluster is recalculated;

(4) circulate 2 to 3 steps, until each cluster no longer changes.

Like this, each cluster centre is considered as a vision word, generates a visual vocabulary table be made up of M vision word.Training video all can represent with the word in visual dictionary.And for test video, the SIFT extracted in the same way in video flows feature, calculate SIFT corresponding to each vision word in these SIFT stream feature and visual dictionary and flow the Euclidean distance of feature, define the picture frame in this video by the vision word of arest neighbors.

7, according to the frequency of video file statistics vision word

8, TMBP topic model and the modeling of Knowledge-TMBP topic model is utilized respectively

Figure 6 shows that traditional LDA graph model, this graph model representation is also referred to as " plate representation ", LDA model is a tertiary level Bayesian model, in figure, the node of black represents observable variable, other nodes are potential variablees, the quantity that K is the theme, N is the quantity of vocabulary in current document, and D is number of files.At word layer, there is w _nand z _ntwo Variables, represents the word w of the n-th document respectively _nwith the theme label z of this word _n; In document level, there is θ _dand φ _ktwo Variables, φ _kbe the matrix of a K × V (V is the dimension of word), every a line represents the vocabulary distribution of a theme.θ _dbe the matrix of D × K, every a line represents the theme probability distribution of one section of document; At corpus layer, α and β is the super ginseng of two Dirichlet distributes respectively.

If be the factor graph with its phase equivalence by the model conversation of LDA, and then utilize BP (Belief Propagation) algorithm to carry out reasoning generative process, this mode substantially increases the pace of learning of model, and its factor graph represents as Fig. 7 represents.

In Fig. 7, factor θ _dwith with box indicating, their link variable z _w,drepresent with circle.Contrast with Fig. 6, at word layer, variable w originally _nand z _nmerge into a variable z _w,d; z _w,drepresent the theme label at the place of word w in document d; Factor variable θ is had in document level _dand φ _k, they represent the probability distribution of theme distribution corresponding to specified documents and the corresponding word list of designated key respectively, and their neighbours' variable is respectively z _{-w, d}and z _{w ,-d}, this with represent the same in Fig. 6, z _{-w, d}represent the theme label of all word index in document d except word w, z _{w ,-d}represent the theme label of word w in all word index except the word in document d; The super ginseng of α and β two is retained, for controlling the θ of document level at corpus layer _dand φ _kvariable is also identical with Fig. 6.

The flow process of whole BP algorithm is as described in Table 1.In input parameter, K is classification scheme number, and T is maximum iteration time, α and β calculates as known quantity.Output parameter θ _dthe matrix of D × K, the probability that record document occurs at corresponding theme; φ _wbe the matrix of K × W, store the probability that word occurs at corresponding theme.

Table 1BP algorithm flow

In TMBP model, the information w for observable variable unique in model does not do pre-service, and the frequency occurred in a document with word is completely as the input information of model.In text analyzing, participle can be carried out to word, remove stop words, as these insignificant words of a, the, thus improve the result to the final classification of document.In the field of dynamic scene classification, also need to consider that vision word is to the whether significant problem of expression theme.So, contemplated by the invention the tolerance to vision word priori, add the priori of word, just make the reasoning results of model more meet the thinking of the mankind.Therefore, the present invention attempts the priori of TF-IDF (the Term Frequency Inverse Document Frequency) knowledge in document study as vision word, join TMBP model, TMBP model is made corresponding improvement, i.e. Knowledge-TMBP model, its graph model represents as shown in Figure 8.

The place of this model adaptation has only had more a m compared with TMBP model _w,dnode, the priori weight of this node on behalf word, priori calculates gained by the inverse document frequency of TF-IDF and word.TF-IDF is used to determine certain word at a specific document compared to the probability in all document library.Simply, this calculating determines the degree of correlation of a given word in certain particular document.If a word appears in a document or sub-fraction document, this word is just by the TF-IDF value that tendency imparting one is higher, and correspondingly, the word all occurred in great majority or all documents is just inclined to the lower TF-IDF value of imparting one.Certainly, TF-IDF has the method for much calculating, and the method that the present invention adopts is as follows, and at a given document library D, wherein one section of document is denoted as d, so

m_{w} = f_{w, d} * \log (\frac{D}{n_{w}} + 0.1) - - - (4)

Wherein f _w,drepresent the frequency that word w occurs in document d, D is number of files in whole document library, n _wrepresent the number of files occurring word w.

9, the process of model Output rusults

By test, model can export the probability distribution of each test data for each theme, select probability maximum theme as the subject categories of this dynamic scene.

Advantage of the present invention is:

1, static map image set is mostly confined to for existing theme semantic scene sorting technique, vision word generation method can not be directly applied for the problem of dynamic scene, when taking into full account the room and time stochastic and dynamic in dynamic scene, propose to adopt SIFT (Scale Invariant Feature Transform) to flow the dynamic scene vision word generation method of feature interpretation.Because the behavioral characteristics of the method extracts based on SIFT feature point, so have stable to yardstick, rotation, affined transformation, to the local message in static map picture frame, also there is stronger robustness; When computational flow, avoid hypothesis constant to grey scale pixel value in traditional light stream, using dense SIFT descriptor as the basis in flow field between adjacent picture frame, the interference that noise spot stream field calculates can be overcome.At quantization stage, be quantified as visual signature the spatial positional information problem that vision word loses feature, adopt even blocking characteristic figure thought, be vision word according to the Location quantization of SIFT stream, solve the problem lacking spatial positional information in vision word.

2, for traditional theme model (PLSA, LDA) problem that the training time is long, nicety of grading is not high in dynamic scene classification, the present invention is on the basis of research TMBP model, and the one proposing to introduce priori improves Knowledge-TMBP topic model.This model as the expression of priori, rewrites the Message Transmission in original TMBP model by reverse document ordinal number between dynamic vision word and image document, thus it is larger decisive to ensure that important vision word has in theme is derived.

Accompanying drawing explanation

Below in conjunction with drawings and Examples, the invention will be further described:

The schematic diagram calculation of Fig. 1 SIFT descriptor.

What Fig. 2 SIFT stream was from coarse to fine searches process schematic.

Fig. 3 SIFT flows histogrammic piecemeal.

The piecemeal example of Fig. 4 characteristic pattern.

Fig. 5 SIFT flows histogram.

The graph model of Fig. 6 LDA model represents.

The factor graph of Fig. 7 TMBP model represents.

Fig. 8 Knowledge-TMBP graph model.

Fig. 9 visual dictionary scale is on the impact of nicety of grading.

Classification performance under the different vision word of Figure 10-1 two kind describes compares.

Classification performance under the different vision word of Figure 10-2 two kinds describes compares.

Classification performance under the different vision word of Figure 10-3 two kinds describes compares.

Classification performance under the different vision word of Figure 10-4 two kinds describes compares.

The classification performance of Figure 11-1 three kind of model compares.

The classification performance of Figure 11-2 three kinds of models compares.

The classification performance of Figure 11-3 three kinds of models compares.

The classification performance of Figure 11-4 three kinds of models compares.

Embodiment

Embodiment: dynamic image storehouse of the present invention have employed YUPENN_Dynamic_Scenes, wherein contains the video of 14 classifications, they are Ocean respectively, Sky-clouds, Snowing, Waterfall, Fountain, Forest Fire, Beach, Highway, Street, Elevator, Lighting Storm, the each classification of Railway, Windmill Farm, Rushing River comprises 30 videos.In the video of each classification, random selecting 10 videos are as training data, and 10 as test data.Experimental Hardware environment is: Windows7, Pentium4 processor, and dominant frequency is 2.8G, inside saves as 4G.Code running environment is: MATLAB2013a.

Because the capacity of visual dictionary can have a great impact the performance of classification, be similarly in the process that dynamic scene is classified and determine that the scale of visual dictionary is tested, be employed herein average nicety of grading to weigh.Fig. 9 gives the average nicety of grading of the Knowledge-TMBP scene classification under different visual dictionary scale.As can be seen from the figure, when the scale of visual dictionary is about 400 time, good classification performance can be reached.

Under the condition of employing 400 word scale, related experiment is done to each dynamic scene, classification results adopts four classification performances to measure, nicety of grading (precision respectively, P), accuracy (accuracy, ACC), recall rate (recall, R) and F-measure (F).

Precision (P) is the tolerance of degree of accuracy, and represent and be divided into the actual ratio for positive example in the example of positive example, it is defined as follows:

Classification accuracy rate (ACC) is modal evaluation index, and its calculating is such as formula shown in (6), and accuracy is higher, illustrates that sorter is better.

Recall rate is the tolerance of coverage rate, and tolerance has the positive example of how many reality to be divided into positive example by correct, and it is defined as follows:

F-measure is the index of another kind of tolerance classification performance, and it is defined as follows:

F = \frac{2 * ACC * R}{ACC + R} - - - (8)

The classification performance of table 2 experimental results, the average nicety of grading of sorting technique of the present invention reaches 75% as can be seen from the table, and average recall rate has 76%, F-measure then to reach 84%.(reservation)

Flow feature as under the condition of scene description what adopt based on SIFT, table 2 gives the classification performance utilizing Knowledge-TMBP model modeling, and table 3 gives the classification performance utilizing TMBP model.Can find out that average classification recall rate R and the F-measure of Knowledge-TMBP model is slightly higher than TMBP model from the comparing of table 2 and table 3, and average classification accuracy P declines slightly, both average correct classification rate are almost equal.

Table 2 is based on the classification performance of the dynamic scene of Knowledge-TMBP topic model

Table 3 is based on the classification performance of the dynamic scene of TMBP topic model

In order to verify the impact of scene description whether on scene classification result based on SIFT stream, on the basis utilizing Knowledge TMBP model to carry out classifying equally, by vision word describing method of the present invention with utilize the color characteristic of the difference diagram of frame of video to set up vision word describing method simply to contrast, the video that have chosen 7 class dynamic scenes in Dynamic_Scenes has been relevant contrast experiment.Classification performance as shown in Figure 10.Wherein (a) figure is that nicety of grading under these two kinds of vision word describe compares, and be higher than the color characteristic based on difference diagram based on the nicety of grading of SIFT flowable state feature generally, average nicety of grading exceeds 7%; B () figure is that the recall rate of two kinds of methods compares, the recall rate based on SIFT flowable state feature compares the color characteristic recall rate mean height 9% based on difference diagram; C () figure is that the F-measure of two kinds of methods compares, the F-measure based on SIFT flowable state feature compares the color characteristic F-measure mean height 18% based on difference diagram; D () figure is that the accuracy of two kinds of classifications compares, the average accuracy based on SIFT flowable state feature is higher by about 6% than the color characteristic average accuracy based on difference diagram.

In order to verify that Knowledge TMBP and other topic models are on the impact of dynamic scene classification results, the present invention utilizes the color characteristic of the difference diagram of simple frame of video to set up vision word, has done relevant experiment respectively in GS-LDA, TMBP and Knowledge TMBP model.The data set of experiment is the scene of 7 classes in Dynamic_Scenes dynamic scene, the time of training is as shown in table 4, although add the priori of word, the training time can extend a little than the time of TMBP model, is obviously better than the training time of PLSA model and GS-LDA model.

The comparison of three kinds of model training times of table 4

As shown in figure 11, (a) figure is that the nicety of grading P of three kinds of models compares to the performance of the classification of three kinds of models; B () figure is that the classification recall rate R of three kinds of models compares; C () figure is that the classification F-meature of three kinds of models compares; D () figure is that the classification accuracy rate ACC of three kinds of models compares.Can find out that the TMBP model adding priori improves 5% than the nicety of grading of LDA model originally from the data Figure 11, recall rate improves 6%, F-measure and improves 6%, and classification accuracy rate improves 2%.

Dynamic scene sorting technique of the present invention and the low layer space-time characteristic SOE (Spatiotemporal Oriented Energy) of employing, the experimental result of GIST, HOF method are compared.Table 3 gives the comparative result of the classification recall rate of distinct methods.As can be seen from the table, the method applied in the present invention recall rate compared with GIST method improves at most, probably about 16%, also improves 2% than SOE method.

The comparison of table 5 four kinds of dynamic scene sorting technique recall rates

Claims

1., based on a dynamic scene sorting technique for topic model, it is characterized in that, comprise the following steps:

(2) characteristic quantification is flowed to dynamic video SIFT, and cluster is formed as vision word;

2. the dynamic scene sorting technique based on topic model according to claim 1, is characterized in that, described step (1) is specially:

(1) be image sequence by Video processing;

(2) design of graphics describes as dense SIFT feature;

(3) SIFT feature is mated;

(4) sports ground of SIFT stream is calculated.

3. the dynamic scene sorting technique based on topic model according to claim 1, is characterized in that, described step (2) is specially: carry out even piecemeal to dynamic video SIFT flow field figure picture, be divided into grid turns to the histogram of 8 handles by the side vector that SIFT flows to each piecemeal, form 72 D feature vectors, utilizes K-means cluster to be formed as vision word.

4. the dynamic scene sorting technique based on topic model according to claim 1, is characterized in that, further comprising the steps of between described step (2) and step (3): according to the frequency of video file statistics vision word.

5. the dynamic scene sorting technique based on topic model according to claim 1 or 4, it is characterized in that, described step (3) is specially: introduce word prior imformation and expand original TMBP model, and original TMBP model and Knowledge-TMBP model are utilized the vision word modeling after quantizing, the each test data exported according to model for the probability distribution of each theme, select probability maximum theme as the subject categories of this dynamic scene.

6. the dynamic scene sorting technique based on topic model according to claim 2, is characterized in that, the extraction of described step (2) dense SIFT feature is mainly divided into two steps:

(1) single pixel derivative value and derivative direction is calculated;

(2) neighborhood of statistics point pixel, form histogram, each like this descriptor just defines proper vector, for every piece image, is obtained for their dense SIFT and describes.