CN102098449A

CN102098449A - Method for realizing automatic inside segmentation of TV programs by utilizing mark detection

Info

Publication number: CN102098449A
Application number: CN2010105740748A
Authority: CN
Inventors: 董远; 肖国锐
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2010-12-06
Filing date: 2010-12-06
Publication date: 2011-06-15
Anticipated expiration: 2030-12-06
Also published as: CN102098449B

Abstract

The invention relates to the technical field of image processing and pattern recognition, and provides a method for realizing automatic inside segmentation of TV programs by utilizing mark detection. At present, the inside segmentation of the TV programs has urgent requirements; and the incontinuity of the mark time in a program leads the program to have better constitutive property. The method provided by the invention comprises the following steps: (1) segmenting shots on a program video, and extracting a subgraph in an area in which a mark is located from a keyframe of each shots; (2) extracting a feature vector of the subgraph, and classifying by utilizing a support vector machine (SVM) classifier aiming at the mark detection; (3) carrying out statistics on classification results, and demarcating the mark attribute of each lens; and (4) segmenting the video on a shearing point of the mark attribute of the adjacent shots. By the method provided by the invention, the keyframe is processed in the mark detection process, thus improving the efficiency of the method; and the objects applied in the method are the TV programs with incontinuous marks in the program; and the method has no requirement on the program content types, thus enhancing the application universality of the method.

Description

A kind of Mark Detection of utilizing is carried out the method that TV programme inside is cut apart automatically

Technical field

The invention belongs to image processing and mode identification technology, be specifically related to a kind of Mark Detection of utilizing and carry out the method that TV programme inside is cut apart automatically.

Background technology

At present, radio and television every days is all at the video that produces magnanimity, and provided electric program menu.Along with extensively popularizing of Web TV and Digital Television, for the better impression of viewing and admiring is provided, many TV programme attempt cutting apart by inner paragraph, provide the inner rating of program and instruct.Simultaneously, to cut apart also be the prerequisite of further content analysis and retrieval in the inside of program.In the face of the video of magnanimity, artificial mark is cut apart can not satisfy the timeliness requirement, and machine is finished is partitioned into active demand automatically.The video structure fractional analysis be meant to video flowing carry out that camera lens is cut apart, processing such as key-frame extraction and scene are cut apart, thereby obtain the structured message of video.Scene is cut apart and is mainly concentrated on the scene cluster, repeats Video Detection, and shot similarity is than on the reciprocity method, often more complicated.Current, increasing TV programme is paid close attention to intellectual property when using the own sign of station symbol or program: at the video paragraph of the non-own property right of program inside, as advertisement, the vidclip of quoting etc. will can not load these signs; And the video paragraph of service marking teaser or tail normally, interview part, or other fragments of recording by this program oneself.It is very strong structural that the discontinuity of sign on time series makes that TV programme has, and cutting apart for the inside of TV programme provides foundation.

Summary of the invention

At TV programme, its station symbol or program sign are referred to as sign below, have temporal discontinuity, the invention provides a kind ofly to the inner automatic division method of this kind TV programme, reach segmentation effect quickly and accurately.

The key step of the inner automatic division method of TV programme of the present invention is as follows:

Step 1 is utilized a kind of existing shot segmentation technique that television program video is carried out camera lens and is cut apart, the shot sequence information that the acquisition time is continuous;

Step 2 is got 5 frame key frames to each camera lens by the time average mode, and extracts the subgraph of the rectangular area of sign position in all key frames;

Step 3, the characteristics of image vector of all subgraphs of extraction training set, the subgraph that contains sign is positive sample, and the subgraph that does not contain sign is a negative sample, and training obtains being used for the svm classifier device of Mark Detection;

Step 4 with this program video to be split, obtains all subgraphs through step 1 and step 2, extracts the characteristics of image vector identical with step 3, utilizes the svm classifier device that obtains in the step 3 to classify, and obtains the classification results of subgraph;

Step 5, mark camera lens sign attribute, if having at least 3 frame subgraphs to be judged as the existence sign in a camera lens, then this camera lens of mark is the sign camera lens, otherwise is labeled as non-sign camera lens;

Step 6, program video inside is cut apart, and the border of adjacent camera lens that has the unlike signal attribute in the video as cut-point, is divided into paragraph to video.

Description of drawings

Fig. 1 is TV programme topology example figure of the present invention.

Fig. 2 is the basic flow sheet of the method for the invention.

Embodiment

Shown in Fig. 2 flow chart, the method for the invention comprises two stages: off-line training grader and online processing video to be split.Two common steps of stage are that camera lens is cut apart, and extract 5 frame key frames and mark region subgraph thereof.It below is the method embodiment.

(1) the camera lens segmentation procedure is to utilize existing a kind of camera lens partitioning algorithm, as based on histogram, based on motion and at the algorithm of compressed video, television program video is cut into continuous shot sequence of time.

(2) each camera lens is divided into 6 sections by the time, 5 two field pictures of getting adjacent segment are as key frame; At this TV programme, known sign is determined that the rectangular area at its place, this rectangle will indicate fully just surrounds, rectangular coordinates be (x, y, w, h), x wherein, y is respectively the horizontal ordinate of the upper left angle point of rectangle, w, h are respectively the wide and high of rectangle; To this rectangle of all key-frame extraction, be called subgraph.

(3) extract three kinds of characteristics of image vectors of all subgraphs: the HSV spatial color histogram, the edge gradient histogram is based on the SIFT characteristic point histogram of speech bag model; Then three kinds of features are connected, form last characteristics of image vector.Concrete feature extracting method is as follows:

1. color histogram extracts

Subgraph is extracted HSV spatial color statistic histogram, and wherein the H space is divided into 8 intervals, and the S space is divided into 3 intervals, and the V space is divided into 3 spaces, with histogram normalization, forms the characteristic vector of 72 dimensions;

2. the edge gradient histogram extracts

Subgraph is extracted the edge gradient histogram, and per 5 degree are an interval, and the gradient in each interval range that adds up with histogram normalization, forms the characteristic vector of 72 dimensions;

3. the SIFT characteristic point histogram based on the speech bag model extracts

Extract all subgraph SIFT characteristic vectors; Use the SIFT characteristic vector cluster of K means clustering algorithm, obtain 64 cluster centres, as the code book of speech bag model to training set data; All SIFT eigenvector projections of each subgraph are arrived code book, and the histograms that form 64 dimensions are also done normalization, obtain characteristic vector;

4. with above three feature vectors polyphone, form the characteristic vector of 208 last dimensions.

(4) off-line training is used for the svm classifier device of Mark Detection, and the characteristics of image vector input SVM instrument of the positive negative sample of training set is trained, and herein, positive and negative collection number of samples is all greater than 1000, and SVM selects the kernel function based on card side's distance.

(5) subgraph for the treatment of divided video extracts the characteristics of image vector identical with step (3), totally 208 dimensions; Wherein, the code book of the needs of the histogram feature vector of formation SIFT is the code book that uses in the step (3), is obtained through K Mean Method cluster by training set.

(6) the svm classifier device that utilizes step (4) to obtain is classified to the characteristic vector that step (5) obtains, and classification results is demarcated each subgraph and whether had sign.

(7) check the number of key frames that contains sign in each camera lens by step (6) result, if more than or equal to 3, then this camera lens of mark is the sign camera lens, otherwise this camera lens of mark is non-sign camera lens.

(8) check the camera lens mark of video to be split by camera lens, if adjacent two camera lens mark differences, then with the border of these two camera lenses as a cut-point, up to intact all the adjacent camera lenses of sequential search, this program video inside is cut apart and is finished at last.

Claims

1. one kind is utilized Mark Detection to carry out the method that TV programme inside is cut apart automatically, it is characterized in that comprising the steps:

Step 1 is utilized shot segmentation technique that television program video is carried out camera lens and is cut apart, the shot sequence information that the acquisition time is continuous;

Step 6, program video inside is cut apart, and the adjacent shot boundary that has the unlike signal attribute in the video as cut-point, is divided into paragraph to video.

Wherein, described step 2 specifically comprises:

Step 1 is divided into 6 sections with each camera lens by the time, and 5 two field pictures of getting adjacent segment are as key frame;

Step 2 at this TV programme, is determined to known sign that the rectangular area at its place, this rectangle will indicate fully just and is surrounded, rectangular coordinates be (x, y, w, h), x wherein, y is respectively the horizontal ordinate of the upper left angle point of rectangle, w, h are respectively the wide and high of rectangle;

Step 3 to this rectangle of all key-frame extraction, is called subgraph.

Wherein, described step 3 specifically comprises:

Step 1 is extracted HSV spatial color statistic histogram to subgraph, and wherein the H space is divided into 8 intervals, and the S space is divided into 3 intervals, and the V space is divided into 3 spaces, with histogram normalization, forms the characteristic vector of 72 dimensions;

Step 2 is extracted the edge gradient histogram to subgraph, and per 5 degree are an interval, and the gradient in each interval range that adds up with histogram normalization, forms the characteristic vector of 72 dimensions;

Step 3, extract all subgraph SIFT characteristic vectors, use the SIFT characteristic vector cluster of K means clustering algorithm to training set data, obtain 64 cluster centres, as code book, all SIFT eigenvector projections of each subgraph are arrived code book, and the histograms that form 64 dimensions are also done normalization, obtain characteristic vector;

Step 4 with above three feature vectors polyphone, forms the characteristic vector of 208 last dimensions;

Step 5, off-line training are used for the svm classifier device of Mark Detection, and the characteristic vector input SVM instrument of the positive negative sample of training set is trained, and herein, positive and negative collection number of samples is all greater than 1000 in the training, and SVM selects the kernel function based on card side's distance.