Summary of the invention
The objective of the invention is to overcome existing news main broadcaster's detection method poor robustness, accuracy rate is low, computation complexity height, and the wideless defective of versatility, a kind of versatility is wide, accuracy rate is high thereby provide, news main broadcaster's detection method that computation complexity is low.
To achieve these goals, the invention provides a kind of news main broadcaster's detection method based on the space-time strip pattern analysis, order is carried out according to the following steps:
Step 10), from through the continuous N frame of intercepting editor's the news video as one group, and from the successive frame group that is intercepted horizontal space-time strip of extraction and vertical space-time strip;
Extract pairing characteristics of image in described horizontal space-time strip and the vertical space-time strip, obtain corresponding characteristic vector step 20), respectively;
Step 30), by clustering method the pairing characteristic vector of described horizontal space-time strip and vertical space-time strip is distinguished cluster, and the level that the time in the same class is continuous or vertical space-time strip merge respectively, as the new element in the class, obtain final horizontal clustering result and vertical cluster result;
Step 40), the class that includes maximum elements in the class that includes maximum elements among the described horizontal clustering result and the described vertical cluster result is merged, detect news main broadcaster camera lens according to fusion results.
In the technique scheme, in described step 10), the horizontal space-time strip of extraction is meant:
From described successive frame group, extract the same delegation pixel of each successive image frame horizontal direction, and with extracted each the row pixel be spliced into the new image of a width of cloth, resulting new images is horizontal space-time strip, the length of described horizontal space-time strip is the frame number of the image that comprises in one group of successive frame group, and wide is the length of a two field picture.
In the technique scheme, in described step 10), extract vertical space-time strip and be meant:
From described successive frame group, extract the same row pixel of each successive image frame vertical direction, and each the row pixel that is extracted is spliced into the new image of a width of cloth, resulting new images is vertical space-time strip, the length of described vertical space-time strip is the frame number of the image that comprises in one group of successive frame group, and wide is the wide of a two field picture.
In the technique scheme, in described step 10), the value of described N is 25 integral multiple.
In the technique scheme, in described step 20) in, described characteristics of image is color characteristic and textural characteristics.
When extracting the color characteristic of image, can be at color space RGB (Red, Green, Blue; Red green blue tricolor) or HSV (Hue, Saturation, Value; Hue/saturation/purity colour model) or HIS (Hue, Saturation, Intensity; Hue/saturation/brightness and color model) or YUV (Y: luminance signal; Color difference signal) or Lab (L: luminance signal U and V:; A and b: realize color difference signal).
When in described color space HSV, extracting color characteristic, may further comprise the steps:
Step 21-1), the rgb value with image is converted into chromatic value, intensity value and brightness value;
Step 21-2), chromatic value, intensity value and the brightness value to image carries out grade quantizing respectively;
Step 21-3), increase the attention rate of tone, and the three-dimensional vector behind the grade quantizing is converted into an integer value by linear combination, each numerical value is represented a color segments;
Step 21-4), each pixel in the image all quantized after, extract the color characteristic of image.
At described step 21-4) in, the color characteristic that extracts image is meant: the fritter that at first entire image on average is divided into 4*4, then all fritters are combined as 5 big piecemeals, respectively correspondence up and down and mid portion, to different piecemeals, following different implementation method is arranged when extracting color characteristic:
Each extracts three-dimensional color moments to four plates up and down, and the color moment of described three-dimensional comprises the first moment of color, second moment and third moment;
Middle plate extracted amount is turned to 32 grades histogram;
Also to extract the third moment of the color that is used to describe the image overall color characteristic to entire image.
The extraction image texture features is meant: describe textural characteristics by entire image is extracted edge histogram.
In the technique scheme, in described step 30) in, described clustering method adopts the K-mean clustering method.
In the technique scheme, described step 40) specifically may further comprise the steps:
Step 41), calculate the similarity of corresponding camera lens in the class that includes maximum elements in the class that includes maximum elements among the described horizontal clustering result and the described vertical cluster result;
Step 42), according to described shot similarity result of calculation, whether the similarity of two camera lenses in determined level cluster and the vertical cluster greater than predefined first threshold, if then two camera lenses are the corresponding element in the class, otherwise are non-corresponding element;
Step 43), calculate the similarity that includes the class of maximum elements in the class that includes maximum elements among the described horizontal clustering result and the described vertical cluster result;
Step 44), according to step 43) include the similarity result of calculation of the class of maximum elements in the class that includes maximum elements among the horizontal clustering result that obtains and the vertical cluster result, whether the similarity of judging two classes is greater than predefined second threshold value, if, then described two classes are merged, class as news main broadcaster camera lens correspondence, if the similarity of two classes is less than or equal to predefined second threshold value, then video does not have tangible news video structure, need not extract news main broadcaster camera lens.
In described step 41) in, the similarity of calculating corresponding camera lens is meant: calculate the ratio that corresponding camera lens intersection time span on time domain accounts for total time length, if the calculation of similarity degree result less than zero, then is modified to zero.
In described step 43) in, the similarity of compute classes is meant: calculate the similarity of two classes by calculating the similarity summation between each element in two classes.
In described step 44) in, two classes merge and are meant: if there is not step 42 in the element in class in another class) described corresponding element, then with the new element of this element as the class after merging, if there is step 42 in two classes) described corresponding element, then corresponding element is merged, and with the earliest start time of two corresponding elements and the latest the concluding time as the zero-time that merges the back element.
The value of described first threshold is 0.5, and the value of described second threshold value is 0.5.
Advantage of the present invention is that all kinds of news video main broadcasters are detected the accuracy rate height, highly versatile, and computation complexity is low, has avoided existing method too to rely on camera lens accurately and has cut apart shortcoming with other modal informations.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is described in further detail:
As shown in Figure 1, the news main broadcaster's detection method based on the space-time strip pattern analysis of the present invention may further comprise the steps:
Step 10, from through the continuous N frame of intercepting editor's the news video as one group, and from the successive frame group of being extracted horizontal space-time strip of extraction and vertical space-time strip.In this step, when extracting horizontal space-time strip, shown in Fig. 2 (a), Fig. 2 (b), extract the same delegation pixel of each continuous in every group two field picture horizontal direction, and with extracted each the row pixel be spliced into the new image of a width of cloth, resulting new images is horizontal space-time strip.The length of horizontal space-time strip is the frame number that comprises image in a group, and wide is the length of a two field picture in the video.When extracting vertical space-time strip, shown in Fig. 2 (a), Fig. 2 (c), extract the same row pixel of every two field picture vertical direction continuous in a group, and each the row pixel that is extracted is spliced into the new image of a width of cloth, resultant new images is vertical space-time strip, the length of vertical space-time strip is the frame number of the image that comprises in one group, and wide is the wide of a two field picture in the video.
At the horizontal space-time strip of said extracted during with vertical space-time strip, should extract the pixel of the same number of rows or the same number of columns of each frame in the successive frame group, but the pixel rows that is extracted or the position of pixel column can be selected as required, in the present embodiment, can select the centre in the two field picture to go or middle column.
In this step, the N value of the N continuous frame that is intercepted can be set as required, in the present embodiment, and N value desirable 50.This step has avoided camera lens is cut apart the dependence of accurate detection by the intercepting to successive frame.After this step of news video process through editor, can obtain a series of horizontal space-time strips and vertical space-time strip.
Pairing characteristics of image in step 20, the horizontal space-time strip of extraction and the vertical space-time strip.Because in the news main broadcaster camera lens, figure and ground in the relative News Stories camera lens of the variation of news main broadcaster and studio backgrounds changes less, therefore the color and the texture of the corresponding space-time strip of news main broadcaster camera lens only have minor variations, and the color of the space-time strip of News Stories unit correspondence and texture variations are more obvious, therefore adopt the color of image and textural characteristics to come token image to help follow-up image clustering operation.From above-mentioned step 10 as can be known, horizontal space-time strip all is the new images that is spliced with vertical space-time strip, and the extraction of they being done characteristics of image specifically may further comprise the steps:
The color characteristic of step 21, extraction image.When extracting the color characteristic of image, can in different color spaces, carry out, as common RGB, HSV, HIS, YUV, Lab etc. all can.In the present embodiment, be example with HSV (Hue colourity, Saturation saturation, Value brightness) space, realize extraction to color characteristic.When extracting color characteristic, comprising:
Step 21-1, the rgb value of image is converted into colourity, saturation and brightness value;
Step 21-2, the colourity with image, saturation and brightness value carry out grade quantizing respectively; Because in the HSV space, the corresponding three-dimensional vector of each pixel represent colourity, saturation and the brightness of place picture element respectively, but the unit of three values and excursion is all inequality, therefore need do grade quantizing respectively.
Step 21-3, increase the attention rate of tone according to the result of vision research, and the three-dimensional vector behind the grade quantizing is converted into integer value between one 0 to 31 by linear combination, each numerical value is represented a color segments.
After step 21-4, each pixel all quantize, extract the color characteristic of image.The specific implementation of extraction color of image feature as shown in Figure 3, image averaging is divided into the fritter of 4*4, be combined as 5 big piecemeals by shown in Figure 3 then, respectively correspondence up and down and mid portion (ABCD is corresponding respectively to be gone up, a left side, down, right four parts, the corresponding mid portion of E, thick lines are represented the border of the piecemeal that these are big).To different piecemeals, different implementation methods is arranged when extracting color characteristic:
Four plates are up and down respectively extracted the color moments (first moment of color, second moment and third moment) of 3 dimensions;
Middle plate extracted amount is turned to 32 grades histogram;
Also to extract the third moment of the color that is used to describe the image overall color characteristic to entire image.
Step 22, extraction image texture features; When extracting image texture features, textural characteristics is described by entire image is extracted edge histogram.
The specific implementation that above-mentioned steps 21-4 extracts color characteristic and step 22 texture feature extraction is ripe prior art, at list of references " DK Park, YS Jeon, CS Won, and S.-J.Park, Efficient use of localedge histogram descriptor, Proc.of the ACM Workshops on Multimedia, Los Angeles, CA, Nov.2000 " in the extraction of textural characteristics is had detailed record.
Step 23, each space-time strip all images feature is combined the formation characteristic vector, characterize the feature of this space-time strip.
Through the aforesaid operations of this step, each space-time strip has formed a high dimension vector that is used to represent this band feature, and one section news video then forms a plurality of characteristic vectors according to the space-time strip that is extracted in the step 10.
Step 30, by clustering method to the pairing characteristic vector of horizontal space-time strip and vertical space-time strip cluster respectively, and the level that the time in the same class is continuous or vertical space-time strip merge respectively, as the new element in the class.
Generally speaking, news video has following architectural feature:
1. news main broadcaster camera lens periodically occurs in the news video of being everlasting;
2. each news main broadcaster camera lens often has very high similarity;
3. the class of news main broadcaster camera lens correspondence should have maximum elements, because main broadcaster's camera lens usually has very high vision similarity, and other camera lenses are only similar to the adjacent camera lens of time domain in the same story unit.
According to the said structure feature of news video, the present invention formulates following rule: news main broadcaster's camera lens is that a unique class has in whole program much and has the camera lens of similar vision content to it in the news video.By above-mentioned rule as can be known, the class that contains maximum elements in level and the vertical cluster result respectively is exactly the class that comprises news main broadcaster camera lens, wherein, and a camera lens in the corresponding video of each space-time strip.Use Cluster
Max HAnd Cluster
Max VRepresent the maximum class of containing element in level and the vertical direction cluster result respectively, they can be represented with following formula:
Wherein R and S represent the number of elements of the class that containing element is maximum in the horizontal/vertical cluster result respectively.
In addition, before in the News Stories camera lens, existing some personage to appear at camera lens for a long time and situation about changing, therefore in order to prevent obscuring between this type of situation and news main broadcaster camera lens, in this step continuous level or vertical space-time strip of time in those same classes merged respectively, as the new element in the class.
When doing cluster operation, this step can adopt the K-mean clustering method.
Step 40, the class that includes maximum elements in the class that includes maximum elements among the resulting horizontal clustering result in the step 30 and the vertical cluster result is merged, detect news main broadcaster camera lens according to fusion results.In the following description, use Cluster
Max HInclude the class of maximum elements among the expression horizontal clustering result, use Cluster
Max VRepresent to include in the vertical cluster result class of maximum elements, the concrete operations of this step are as follows:
Step 41, calculating Cluster
Max HAnd Cluster
Max VIn the similarity of corresponding camera lens, when calculating shot similarity, by calculating Shot
i HAnd Shot
j VThe ratio that the intersection time span accounts for total time length on time domain is calculated the similarity of two camera lenses.The calculation of similarity degree formula is as follows:
T wherein
Start H, T
End H, T
Start V, T
End VRepresent Shot respectively
i HAnd Shot
j VTime started and concluding time, Min and Max represent to get minimum value and maxima operation respectively.If the similarity that calculates in two formula then is modified to 0 less than 0.
Step 42, according to the shot similarity result of calculation that step 41 obtains, judge two camera lens Shot
i HWith Shot
j VSimilarity Sim<Shot
i H, Shot
j V>whether greater than pre-set threshold Th
1,, otherwise be called " non-correspondence " element if then these two camera lenses are called " correspondence " element in the class.
Step 43, calculating Cluster
Max HAnd Cluster
Max VSimilarity, when calculating similarity, by calculating Cluster
Max HAnd Cluster
Max VIn the summation of similarity between each element calculate the similarity of two classes.Its computing formula is as follows:
Step 44, the Cluster that obtains according to step 43
Max HAnd Cluster
Max VSimilarity result of calculation, judge Cluster
Max HAnd Cluster
Max VSimilarity Sim<Cluster
Max H, Cluster
Max V>whether greater than pre-set threshold Th
2, if then merge class Cluster
Max HAnd Cluster
Max V, as the class of final news main broadcaster's correspondence, if Cluster
Max HAnd Cluster
Max VSimilarity Sim<Cluster
Max H, Cluster
Max V>be less than or equal to pre-set threshold Th
2, represent that this video does not have tangible news video structure, need not extract news main broadcaster camera lens this moment.
In above-mentioned fusion process, if Cluster
Max HOr Cluster
Max VIn an element in another kind of, do not have resulting in the step 42 " correspondence " element, then with the new element of this element as final class, if there is " correspondence " element in two classes, then " correspondence " element is merged, with time started the earliest of two corresponding elements and concluding time the latest zero-time as new element.
The threshold value Th that in step 42, is adopted
1Value desirable 0.5, the value of the threshold value Th2 that is adopted in step 44 is also desirable 0.5, but can do suitable adjustment according to these two threshold values of actual conditions.
It should be noted last that above embodiment is only unrestricted in order to technical scheme of the present invention to be described.Although the present invention is had been described in detail with reference to embodiment, those of ordinary skill in the art is to be understood that, technical scheme of the present invention is made amendment or is equal to replacement, do not break away from the spirit and scope of technical solution of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.