CN108093314B

CN108093314B - Video news splitting method and device

Info

Publication number: CN108093314B
Application number: CN201711371733.6A
Authority: CN
Inventors: 刘楠
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2017-12-19
Filing date: 2017-12-19
Publication date: 2020-09-01
Anticipated expiration: 2037-12-19
Also published as: CN108093314A

Abstract

The invention discloses a video news splitting method and a device, comprising the following steps: decomposing a news video to be processed into at least one shot, recording the starting time point and the ending time point of each shot in the news video, extracting m frames of key frames of the shot according to a preset time interval, analyzing the m frames of key frames of the shot to obtain information of a host of the shot, and recording the starting time point and the ending time point of a news title; and generating news title marking information for marking the shots, and splitting the news video into N pieces of news information according to a preset splitting rule based on the starting time point and the ending time point of each shot in the news video, the host category information of the shots and the news title marking information of the shots. The video news splitting method and the video news splitting device can automatically split the video news based on the host information and the news title information in the video news, and improve the video news splitting efficiency.

Description

Video news splitting method and device

Technical Field

The invention relates to the technical field of video processing, in particular to a video news splitting method and device.

Background

The video news contains a large amount of latest information, and has important value for video websites and news applications. The application of video websites or news needs to split and get online the whole video news broadcasted every day, so that the user can click and watch each piece of news interested in the news. Because of the large number of television stations in the country, various local stations exist besides the satellite television stations, and if all video news need to be segmented, a large amount of manpower is consumed for segmentation. Meanwhile, due to the timeliness of news and the strict requirement on the speed of dividing the video news, more pressure is brought to manual division, the video news is broadcasted in a large amount in a certain time period (such as 12 o 'clock to 12 o' clock half of noon), in order to guarantee the timeliness, the whole video news program needs to be divided into independent news items as soon as possible within a specified time, and the production cannot be carried out in a backlog task post-processing mode.

In summary, in the prior art, there is no technical scheme that can automatically segment video news, and a large number of workers are often required to segment the video news manually, which results in high labor cost. Therefore, how to rapidly and effectively automatically split video news is an urgent problem to be solved.

Disclosure of Invention

In view of this, an object of the present invention is to provide a video news splitting method, which can automatically split video news based on host information and news headline information in the video news, and improve the efficiency of splitting the video news.

In order to achieve the purpose, the invention provides the following technical scheme: a video news splitting method, the method comprising:

clustering video frames in a news video to be processed, and decomposing the news video to be processed into at least one shot;

recording a starting time point and an ending time point of each shot in the news video;

extracting m frames of key frames of the shot according to a preset time interval based on the length of the shot calculated by the starting time point and the ending time point of the shot;

analyzing m frames of key frames of the shot to obtain the host identity information of the shot;

detecting news titles of the news videos to be processed, and recording starting time points and ending time points of the news titles when the news videos to be processed contain the news titles;

generating news title marking information for marking the shots based on the starting time point and the ending time point of the news titles and the starting time point and the ending time point of the shots in the news video;

splitting the news video into N pieces of news information according to a preset splitting rule based on the starting time point and the ending time point of each shot in the news video, the host category information of the shot and the news title mark information of the shot, wherein N is larger than or equal to 1.

Preferably, the performing news title detection on the to-be-processed news video, and when the to-be-processed news video includes a news title, after recording a start time point and an end time point of the news title, further includes:

carrying out duplicate removal operation on the detected news headlines of the news video to be processed, and recording the starting time points and the ending time points of the residual news headlines after duplicate removal;

correspondingly, the generating news headline marking information for marking the shots based on the start time point and the end time point of the news headline and the start time point and the end time point of the shots in the news video comprises:

and generating news title marking information for marking the shot based on the starting time point and the ending time point of the rest news titles after the duplication removal and the starting time point and the ending time point of the shot in the news video.

Preferably, the analyzing the m key frames of the shot to obtain the moderator category information of the shot includes:

inputting each frame of key frame of the shot into a classifier formed by pre-training respectively, and generating a host classification category corresponding to each frame of key frame;

and counting the host classification categories of all key frames of the lens, and determining the host classification category with the largest number as host classification information of the lens.

Preferably, the performing news title detection on the to-be-processed news video, and when the to-be-processed news video includes a news title, recording a start time point and an end time point of the news title includes:

determining a preset area of a video frame of the news video to be processed as a candidate area;

tracking the images in the candidate areas to generate tracking processing results;

and judging whether the candidate area is a news title area or not based on the tracking processing result, if so, determining the appearance time point of the news title area as the starting time point of the news title, and determining the disappearance time point of the news title area as the ending time point of the news title.

Preferably, the generating news headline marking information for marking the footage based on the start time point and the end time point of the news headline and the start time point and the end time point of the footage in the news video includes:

comparing the starting time point and the ending time point of the news title with the starting time point and the ending time point of the shot in the news video;

generating first news headline marking information when the starting time point and the ending time point of the news headline are contained in a time period formed by the starting time point and the ending time point of the shot in the news video;

and when the starting time point and the ending time point of the news headline are not contained in the time period formed by the starting time point and the ending time point of the shot in the news video, generating second news headline marking information.

Preferably, the splitting the news video into N pieces of news information according to a preset splitting rule based on a start time point and an end time point of each of the shots in the news video, moderator category information of the shots, and news title mark information of the shots includes:

the news video is processed according to an information sequence V ═ S_iResolution, where i is 0,1, …, M, S_i＝{T_i，A_i，C_i，C_si}，T_iA start time point and an end time point in the video, A, representing a shot_iRepresenting moderator category information contained in the shot, C_iNews headline marking information representing a shot, C_siRepresenting whether it is a new title.

A video news splitting apparatus, comprising:

the decomposition module is used for decomposing the news video to be processed into at least one shot by clustering video frames in the news video to be processed;

the first recording module is used for recording the starting time point and the ending time point of each shot in the news video;

the extraction module is used for extracting m frames of key frames of the shot according to a preset time interval on the basis of the length of the shot calculated by the starting time point and the ending time point of the shot;

the analysis module is used for analyzing the m frames of key frames of the shot to obtain the information of the host of the shot;

the second recording module is used for detecting news titles of the news videos to be processed, and recording the starting time point and the ending time point of the news titles when the news videos to be processed contain the news titles;

a generating module, configured to generate news headline marking information for marking the shots based on a start time point and an end time point of the news headline and a start time point and an end time point of the shots in the news video;

the splitting module is used for splitting the news video into N pieces of news information according to a preset splitting rule based on the starting time point and the ending time point of each shot in the news video, the host type information of the shot and the news title mark information of the shot, wherein N is more than or equal to 1.

Preferably, the apparatus further comprises:

the duplication removing module is used for carrying out duplication removing operation on the detected news headlines of the news video to be processed and recording the starting time points and the ending time points of the rest news headlines after duplication removing;

accordingly, the generation module is configured to: and generating news title marking information for marking the shot based on the starting time point and the ending time point of the rest news titles after the duplication removal and the starting time point and the ending time point of the shot in the news video.

Preferably, the analysis module is specifically configured to:

Preferably, the second recording module is specifically configured to:

Preferably, the generating module is specifically configured to:

Preferably, the splitting module is specifically configured to:

From the technical scheme, the invention discloses a video news splitting method, when splitting video news, firstly clustering video frames in a news video to be processed, splitting the news video to be processed into at least one shot, then recording the starting time point and the ending time point of each shot in the news video, extracting m frames of key frames of the shot according to a preset time interval based on the calculated length of the shot at the starting time point and the ending time point of the shot, analyzing the m frames of the key frames of the shot to obtain the moderator category information of the shot, carrying out news title detection on the news video to be processed, when the news video to be processed contains the news title, recording the starting time point and the ending time point of the news title, and based on the starting time point and the ending time point of the news title and the starting time point and the ending time point of the shot in the news video, generating news title marking information for marking the shots, and splitting the news video into N pieces of news information according to a preset splitting rule based on the starting time point and the ending time point of each shot in the news video, the host category information of the shots and the news title marking information of the shots, wherein N is more than or equal to 1. The video news splitting method and the video news splitting device can automatically split the video news based on the host information and the news title information in the video news, and improve the video news splitting efficiency.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a video news splitting method disclosed in embodiment 1 of the present invention;

fig. 2 is a flowchart of a video news splitting method disclosed in embodiment 2 of the present invention;

fig. 3 is a schematic structural diagram of a video news splitting apparatus disclosed in embodiment 1 of the present invention;

fig. 4 is a schematic structural diagram of a video news splitting apparatus disclosed in embodiment 2 of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, which is a flowchart of an embodiment 1 of a video news splitting method disclosed in the present invention, the method includes the following steps:

s101, clustering video frames in a news video to be processed, and decomposing the news video to be processed into at least one shot;

when video news needs to be split, similar video frames in news videos to be processed are clustered and combined into a shot. When the video is decomposed into shots, calculating Euclidean distance between color histograms Hi of video frames adjacent to a time domain by calculating a color histogram Hi of an RGB space of each video frame of the news video to be processed, and if the Euclidean distance is larger than a preset threshold Th1, considering that the shots are sheared, and recording all the video frames between a starting position and an ending position as one shot; calculating the distance of a color histogram H [ i ] between the current video frame and the video frame of the previous n frames, if the distance is greater than a preset threshold Th2, determining that the shot gradual change occurs at the position, and recording all the video frames between the starting position and the ending position as a shot; a shot is considered still inside a shot if neither shearing nor fading occurs.

S102, recording the starting time point and the ending time point of each shot in a news video;

after the news video to be processed is decomposed into at least one shot, the start time point and the end time point of each shot in the news video to be processed are recorded.

S103, extracting m frames of key frames of the shot according to a preset time interval based on the length of the shot calculated by the starting time point and the ending time point of the shot;

setting the number m of key frames to be extracted according to the length of the shot calculated by the recorded starting time point and ending time point of the shot, wherein the set rule can be described as follows: when the lens length is less than 2s, m is 1, when the lens length is less than 4s, m is 2, when the lens length is less than 10s, m is 3, when the lens length is greater than 10s, m is 4 (the parameters can be adjusted). The m frames are extracted from the shot as representative frames, the gap of the extracted key frames is calculated as (end position-start position)/(m +1), and the video frames are extracted from the shot at the gap interval as key frames.

S104, analyzing m frames of key frames of the shot to obtain the host identity information of the shot;

and then analyzing each key frame respectively to obtain the host type information of the shot.

S105, performing news title detection on the news video to be processed, and recording the starting time point and the ending time point of the news title when the news video to be processed contains the news title;

meanwhile, news headline detection and analysis are carried out on the news video to be processed, whether the news video to be processed contains the news headline or not is judged, and when the news video to be processed contains the news headline, the starting time point and the ending time point of the news headline are recorded.

S106, generating news title marking information for marking the lens based on the starting time point and the ending time point of the news title and the starting time point and the ending time point of the shot in the news video;

and then generating news title marking information for marking the shots according to the recorded start time point and end time point of the news titles and the start time point and end time point of the shots in the news video to be processed, namely marking whether the shots contain the news titles or not.

S107, splitting the news video into N pieces of news information according to a preset splitting rule based on the starting time point and the ending time point of each shot in the news video, the host category information of the shot and the news title mark information of the shot, wherein N is larger than or equal to 1.

And finally, splitting the news video to be processed into N pieces of news information according to the acquired start time point and end time point of each shot in the news video, the anchor category information of the shot and the news title mark information of the shot, wherein N is more than or equal to 1.

In summary, in the above embodiment, when a poster graph of a news video needs to be generated, a target news video is first decomposed into at least one shot by clustering video frames in the target news video, and then a start time point and an end time point of each shot in the target news video are recorded; extracting m frames of key frames of the shot according to a preset time interval based on the length of the shot calculated by the start time point and the end time point of the shot, recording the start time point and the end time point of each key frame in a target news video, respectively processing each key frame to generate host marking information of the key frame, simultaneously detecting the news title of the target news video, recording the start time point and the end time point of the news title when the target news video contains the news title, generating news title marking information for marking the key frame based on the start time point and the end time point of the news title and the start time point and the end time point of the key frame in the target news video, and finally generating a poster graph of the target video based on the host marking information and the news title marking information of all the key frames, the poster graph capable of representing video news content can be automatically generated based on host information and news title information in video news, and the problems that in the prior art, the generation form of the video news poster graph is single and the user experience is poor are effectively solved.

As shown in fig. 2, which is a flowchart of an embodiment 2 of a video news splitting method disclosed by the present invention, on the basis of the above embodiment 1, in this embodiment, news headline detection is performed on a news video to be processed, and when the news video to be processed includes a news headline, after recording a start time point and an end time point of the news headline, the method further includes:

s201, carrying out duplicate removal operation on the detected news headlines of the news video to be processed, and recording the starting time points and the ending time points of the residual news headlines after duplicate removal;

through observation of news data, it can be found that a news item often appears, and the situation of the same news title is repeatedly shown for many times. If the news is divided only by the news headlines appearing once, the news is over-divided, so that the detected news headlines of the news video to be processed can be further subjected to a deduplication operation, and the starting time point and the ending time point of the rest news headlines after deduplication are recorded.

When a deduplication operation is performed on a news headline of a detected news video to be processed, it is assumed that the positions of frames at which the start and end of the nth headline are obtained are t1, t2, and the position in the video frame is CRn (x, y, w, h), for this headline Cn [ t1, t2 ]. The two titles before it are Cn-1[ t3, t4], Cn-2[ t5, t6], respectively, and the positions in the video frame are CRn-1 and CRn-2.

Step 1, comparing the current title Cn with the previous title Cn-1, calculating the proportion of the repeated area in the video, namely calculating the proportion R1 of the repeated area of CRn and CRn-1, if R1> ═ Thr, then considering that the two titles need to be subjected to deduplication comparison, and going to step 2. Otherwise, continuously comparing the Cn with the region of Cn-2 for repeatability R2, if R2> -Thr, considering that the two titles need to be subjected to duplicate removal comparison, and turning to step 2, otherwise, considering that Cn is not the repeated title.

Step 2, for the two input titles, selecting a frame representing the content of each title, for Cn, selecting a video frame at (t1+ t2)/2, and for CRn (x, y, w, h), setting a contrast area rect:

rect.x＝x+w*R1；

rect.y＝y+h*R2；

rect.w＝w*R3；

rect.h＝h*R4；

r1, R2, R3 and R4 are all preset parameters.

The image in the video frame rect is selected as IMG1, for Cn-1 (or Cn-2), the video frame at the time of (t3+ t4)/2 (or (t5+ t6)/2) is selected, and the image in the same area rect is selected and recorded as IMG 2.

And 3, converting the two input images from the RGB color space into a gray level/or any brightness color separation space (such as YUV, HSV, HSL and LAB), wherein the gray level space conversion formula is as follows:

Gray＝R*0.299+G*0.587+B*0.114；

for the luminance color separation space, taking HSL as an example, the conversion formula of luminance l (luminance) is:

L＝(max(R,G,B)+min(R,G,B))/2。

step 4, calculating a segmentation threshold, and calculating the grayscale segmentation threshold by using an OTSU method for the grayscale or brightness image of the IMG1, wherein the OTSU method is described as follows:

(1) it is assumed that the grayscale image I can be divided into N grays (N < ═ 256), for which N grays the N-order grayscale histogram H of the image can be extracted.

(2) For each bit t (0< ═ t < N) in the histogram, the following formula is calculated:

x(i)＝i*256/N

(3) is obtained in such a way that

The division threshold Th is x (t) corresponding to the maximum t.

And 5, binarizing the images IMG1 and IMG 2. The pixel of the reference binarized image B corresponding to the image IMG1 or IMG2 pixel (x, y) is IfI (x, y) < Th, and B (x, y) ═ 0; ifI (x, y) > -Th and B (x, y) — 255.

Step 6, carrying out point-by-point difference on the binary images B1 and B2 of the IMG1 and the IMG2, and calculating the average value Diff of the difference:

where W and H are the width and height of the rect region.

And 7, comparing the Diff with a preset threshold, if the Diff is smaller than the threshold, considering the two titles as the same title, and marking the associated shots in the Cn time range [ t1, t2] as the same subtitles, otherwise, marking the shots as different subtitles.

Correspondingly, S202, generating news title marking information for marking the shots based on the starting time point and the ending time point of the rest news titles after the duplication removal and the starting time point and the ending time point of the shots in the news video;

and then generating news title marking information for marking the shots according to the recorded starting time point and ending time point of the rest news titles after the duplication removal and the starting time point and ending time point of the shots in the news video to be processed, namely whether the shots contain news titles or not is marked.

S203, splitting the news video into N pieces of news information according to a preset splitting rule based on the starting time point and the ending time point of each shot in the news video, the host category information of the shot and the news title mark information of the shot, wherein N is larger than or equal to 1.

In summary, in the above embodiments, on the basis of embodiment 1, the detected news headline of the news video to be processed can be further subjected to a deduplication operation, so that the problem of over-segmentation of news is effectively avoided.

Specifically, in the foregoing embodiment, one implementation manner of analyzing m key frames of a shot to obtain host category information of the shot may be:

and respectively inputting each frame of key frame of the lens into a classifier formed by pre-training, generating a host classification category corresponding to each frame of key frame, counting the host classification categories of all key frames of the lens, and determining the host classification category with the largest number as host classification information of the lens.

Namely, for a plurality of frames of key frames of each shot selected previously, inputting the frames into a pre-trained classifier to classify the host, voting the results of the frames, and selecting the category with the most voting results as the category of the shot.

Wherein, the training process of the classifier is as follows: extracting a certain number of video frames from videos of different channels and different news programs, manually classifying the video frames into four categories (four categories are exemplified and not limited in the present invention) including a double-person presiding posture category, a single-person presiding standing posture category and a non-presiding human, training a corresponding classifier by using a deep learning method, wherein a training module refers to a process of training a network model according to an open-source deep learning network training method and a model structure.

Training process: the specific training process of retraining the model by using the cafe open-source deep learning framework (or training by using other open-source deep learning frameworks) is a BP neural algorithm, namely, when forward transmission is performed, the model is output layer by layer, if the result obtained by the output layer is different from an expected value, the model is reversely transmitted, the weight and the threshold value of the model are updated by applying a gradient descent method according to the error of the model, and the operation is repeated for a plurality of times until the error function reaches the global minimum value, so that the specific algorithm is complex and is not an original algorithm, belongs to a general method, and detailed processes are not repeated. Through the training process, a network model for classification can be obtained.

And (3) a classification process: inputting each key frame obtained by each shot after shot segmentation into a trained model, sequentially performing image convolution, posing and RELU operations according to the same model structure and trained parameters until finally obtaining confidence probability output P1, P2, P3 and P4 of each class of images belonging to a double-host sitting posture class, a single-host standing posture class and a non-host, and selecting the class corresponding to the maximum value as the classification class of the unknown image. Namely for example: p1 is the maximum value among (P1, P2, P3, P4), that this image belongs to the two-seater sitting category. For a shot, counting the number of key frames belonging to each category, and selecting the category with more key frames as the category of the shot.

Specifically, in the above embodiment, the detection of the news headline is performed on the news video to be processed, and when the news headline is included in the news video to be processed, one implementation manner of recording the start time point and the end time point of the news headline may be:

determining a preset area of a video frame of a news video to be processed as a candidate area, and tracking images in the candidate area to generate a tracking result; and judging whether the candidate area is a news title area or not based on the tracking processing result, if so, determining the appearance time point of the news title area as the starting time point of the news title, and determining the disappearance time point of the news title area as the ending time point of the news title.

That is, the idea of the headline detection algorithm is to perform news headline detection based on time domain stability for each video frame of an input news video, and acquire frame numbers of a start frame and an end frame of a news headline appearing in the whole news. And comparing the time position of each shot in the video obtained in the module A with the appearance position of the news title, if the time position of each shot in the video is within the range of the appearance of the title, the shot is considered to have the title, and otherwise, the shot is considered to have no title.

The reason why the judgment is carried out in this way is not carried out by using a way of finding titles in a single image is to distinguish the possible roll titles, the roll titles appearing in news are generally displayed in a style extremely similar to the news titles, and if only one image is judged to be the news title, an error occurs, which affects the generation quality of the poster image.

The specific algorithm is as follows:

1. selecting potential candidate regions:

(1) the method comprises the following steps of selecting images in a bottom area (the bottom area is a position where most news titles appear) of a key frame as to-be-detected images, wherein the purpose of area selection is to reduce the calculated amount and improve the detection precision, and the selection method of the bottom area comprises the following steps:

assuming that the width and height of the key frame is W, H, the position of the bottom region Rect (rect.x, rect.y, rect.w, rect.h) (the coordinates of the start of the rectangular region in the key frame and the width and height of the region) in the image of the key frame is:

rect.x＝0；

rect.y＝H*cut_ratio；

rect.w＝W；

rect.h＝H*(1-cut_ratio)；

where cut _ ratio is a preset coefficient.

(2) Converting the selected image to be detected into a gray/any brightness color separation space (such as YUV, HSV, HSL and LAB) from an RGB color space, wherein the gray space conversion formula is as follows:

Gray＝R*0.299+G*0.587+B*0.114

L＝(max(R,G,B)+min(R,G,B))/2

(3) for gray scale or brightness images, there are various methods for extracting edge features of the images, such as Sobel operator, Canny operator, etc., and in this embodiment, Sobel operator is taken as an example to explain:

convolving with the gray level/brightness image by utilizing a horizontal direction edge gradient operator and a vertical direction edge gradient operator to obtain a horizontal edge image Eh and a vertical edge image Ev, and finally calculating an edge intensity image early, namely, for any point on the edge image early (x, y), early (x, y) ═ sqrt (Ev (x, y)2+ Eh (x, y)2)

The edge gradient operators in the horizontal direction and the vertical direction take a Sobel operator as an example, and other operators are also applicable:

(4) for all, The edge map is compared with a preset threshold value The1, and The edge map is binarized, that is, ifall (x, y) > The 1E (x, y) ═ 1, and else E (x, y) > 0.

(5) And respectively executing 3 operations on each channel of RGB of the image to be detected to obtain edge intensity maps Er, Eg and Eb of the three channels.

(6) For Er, Eg, Eb are compared with a preset threshold value The2, and The edge map is binarized, that is, if Er (x, y) > The2Er (x, y) ═ 1 and else Er (x, y) ═ 0 (for a certain channel example). The2 and The1 may be The same or different, if The news headline background is of a type of gradual change mode, The edge of The news headline cannot be detected by using a higher threshold, and The edge detected by using a lower threshold needs to be strengthened, so that generally The2< The1

(7) The obtained edge image E is edge-enhanced, and E (x, y) | Er (x, y) | Eg (x, y) | Eb (x, y), so as to obtain a final edge image. (5) The reinforcing step (7) is optional, and may be used or not used as required. One channel can be enhanced, and three channels can be enhanced, so that detection failure caused by gradual change of a subtitle area is prevented.

(8) And performing horizontal projection on the final edge map, counting the number Numedge of pixels meeting the following condition in each row i, and if the Numedge is greater than Thnum, setting the histogram H [ i ] to be 1, otherwise, setting the histogram H [ i ] to be 0. The following conditions were: if at least one pixel of the pixel and the upper and lower adjacent pixels has a value of 1, the edge value of the pixel is considered to be 1, and the total number of pixels of which the edge values of the pixels which are continuous left and right are 1 and the continuous length is greater than the threshold value Thlen is counted. (purpose guaranteed has a continuous straight line)

(9) And traversing the histogram H [ i ], wherein H [ i ] ═ 1 line spacing, if the spacing is larger than a threshold value threw, taking the edge image area between the two lines as a first-stage candidate area, and if not, continuing to process the next key frame.

(10) For each first-stage candidate region, counting an edge projection histogram V in the vertical direction, and for i of any column, if the number of edge pixels of the column being 1 is greater than Thv, then V [ i ] is 1, otherwise V [ i ] is 0, and V [0] & & V [ W-1] is forcibly set to 1. In V, a region where V [ i ] & & V [ j ] ═ 1& & V [ k ] k ∈ (i, j) ═ 0& & argmax (i-j) is found as the left and right boundaries of the subtitle region. The original image in this region is selected as the candidate region for the second stage. The method of finding the edge pixels of the columns is the same as the method of finding the edge pixels of the rows.

(11) The left and right boundaries of the candidate area in the second stage are found finely, the original image of the candidate area in the second stage is scanned by a sliding window with a certain length (which can be 32 x 32), a color histogram in each window is calculated, meanwhile, the number numcolor of non-0 bits in the color histogram in the window is counted, and the position of a monochromatic area or a background area with complex color is found, namely, the center position of a window meeting the condition is used as a new vertical boundary by numcolor < Thcolor1| | | numcolor > Thcolor 2.

(12) For the rectangular region candidateRect determined by the method, the judgment is carried out by using constraint conditions, wherein the constraint conditions include but are not limited to that the position information of the starting point of candidateRect needs to be in a certain image range, the height of candidateRect needs to be in a certain range, and the like, and if the constraint conditions are met, the rectangular region candidateRect is considered to be a candidate region of a news title. If the candidate area is not in the tracking, the tracking is carried out to a module B, otherwise, the detection is carried out in the module A all the time.

2. Tracking the found candidate regions:

(1) judging whether the area is tracked for the first time, namely, after the processing of the embodiment at the last time, knowing that no area or a plurality of areas are in tracking or the tracking is finished or the tracking is failed, if an area in tracking exists, comparing the area with the current candidate area, if the two areas have higher contact ratio in position, the area can be known to be in tracking, otherwise, the area is determined to be tracked for the first time, wherein the so-called tracking the area for the first time can mean tracking the area for the first time, and can mean tracking the area again after the tracking is finished for the last time. If the tracking is the first tracking, the step (2) is carried out, and if the tracking is not the first tracking, the method steps of the embodiment are exited.

(2) For the first tracked region, a tracking range in the key frame is set (since the candidate region of the input key frame may contain an additional background region, i.e. a region not containing news headlines, a tracking region needs to be set to improve the tracking accuracy). The setting method comprises the following steps: let the candidate region position of the news headline of the key frame be CandidateRect (x, y, w, h) (the starting point x, y in the key frame and the corresponding width and height w, h), and set the tracking region track (x, y, w, h) as:

track.x＝CandidateRect.x+CandidateRect.w*Xratio1；

track.y＝CandidateRect.y+CandidateRect.h*Yratio1；

track.w＝CandidateRect.w*Xratio2；

track.h＝CandidateRect.h*Yratio2；

xratio1, Xratio2, Yratio1 and Yratio2 are all preset parameters.

(3) Selecting an image in the key frame tracking area, converting the image from an RGB color space into a gray/any brightness color separation space (such as YUV, HSV, HSL and LAB), and converting a formula for the gray space into the following steps:

Gray＝R*0.299+G*0.587+B*0.114

L＝(max(R,G,B)+min(R,G,B))/2

(4) calculating a segmentation threshold, and calculating the grayscale segmentation threshold by using an OTSU (over the Top) method for the grayscale or brightness image, wherein the OTSU method is described as follows: it is assumed that the grayscale image I can be divided into N grays (N < ═ 256), for which N grays the N-order grayscale histogram H of the image can be extracted. For each bit t (0< ═ t < N) in the histogram, the following formula is calculated:

x(i)＝i*256/N

is obtained in such a way that

The maximum t corresponds to x (t) as the segmentation threshold Thtrack.

(5) Binarizing the image, i.e., IfI (x, y) < Thtrack for the pixel (x, y) in the image I and the pixel of the reference binarized image Bref corresponding to the pixel (x, y) in the image I, wherein Bref (x, y) ═ 0; ifI (x, y) > -Thtrack, Bref (x, y) — 255.

(6) A color histogram Href of the image in the tracking area is calculated.

(7) For an input key frame, converting the input key frame from an RGB color space into a gray/or any luminance color separation space (such as YUV, HSV, HSL, LAB), and for the gray space, converting the formula as:

Gray＝R*0.299+G*0.587+B*0.114

L＝(max(R,G,B)+min(R,G,B))/2

(8) selecting a gray level image in a tracking area in a key frame, and carrying out binarization, namely IfI (x, y) < Thtrack, Bcur (x, y) ═ 0, for a pixel (x, y) in an image I and a corresponding pixel of a binarized image B; ifI (x, y) > -Thtrack, Bcur (x, y) -255. Thtrack is the result obtained in step 4 when first traced.

(9) Carrying out point-by-point difference on the binarized image Bcur of the current frame and the reference binarized image Bref, and calculating the average value Diffbinary of the difference:

where W and H are the width and height of the tracking area image.

(10) And calculating a color histogram Hcur of the current image in the tracking area, and calculating a distance Diffcolor with Href.

(11) Comparing the obtained Diffbinitial and Diffcolor with a preset threshold, if Diffbinitial < Thibinary & & Diffcolor < Thiolor, returning to the state tracking, tracking a tracking counter tracking _ num + +, otherwise, logging _ num + +; it should be noted that the tracking method based on the color histogram and the binarization may be used only one of them, or may be used in combination.

(12) If lost _ num > Thlost, the tracking end state is returned, and the frame number of the current key frame (the time point when the frame disappears as the news title is recorded) is returned, otherwise, the tracking is returned. The purpose of setting lost _ num is to avoid that individual video signals are interfered, so that images are distorted, and matching fails, and the algorithm is allowed to have a certain number of key frame tracking failures through the setting of lost _ num.

3. Determining whether the tracking area is a title area:

if the tracking of the candidate area is finished, comparing the tracking _ num with a preset threshold value Thtracking _ num, if the tracking _ num > is Thtracking _ num, judging that the image is a news title area, otherwise, judging that the image is a non-news title area.

Specifically, in the above embodiment, one implementation manner of generating the news headline marking information for marking the shots based on the start time point and the end time point of the news headline and the start time point and the end time point of the shots in the news video may be:

comparing the start time point and the end time point of the news title with the start time point and the end time point of the shot in the news video, generating first news title marking information when the start time point and the end time point of the news title are contained in a time period formed by the start time point and the end time point of the shot in the news video, and generating second news title marking information when the start time point and the end time point of the news title are not contained in the time period formed by the start time point and the end time point of the shot in the news video.

Specifically, in the above embodiment, based on the start time point and the end time point of each shot in the news video, the host category information of the shot, and the news title mark information of the shot, one implementation manner of splitting the news video into N pieces of news information according to a preset splitting rule may be:

That is, step 1, for each of the shots, if the news starting point is empty, the shot starting point is set as the news starting point, and the process goes to the next shot, if the news starting point is set, the process goes to step 2.

Step 2, if S_iT in (1)_iBelonging to the two-person host class, then S_i-1T of lens_i-1Is used as the end point of splitting news, and simultaneously, S is used as the end point of splitting news_iAs an independent news item, the starting point of the news is T_iAnd a starting point and an end point, returning two strip splitting results, setting the news starting point to be null, and turning to process the next shot.

Step 3, if S_iT in (1)_iBelongs to the class of sitting posture or standing posture of a single host, then S is_i-1T of lens_i-1Is used as the end point of splitting news, and simultaneously, S is used as the end point of splitting news_iReturning a strip splitting result as the starting point of a new piece of news and turning to process the next shot.

Step 4, if S_iT in (1)_iBelongs to the non-moderator category, and C_iWith subtitles and C_siIf the new caption is, then S_i-1T of lens_i-1Is used as the end point of splitting news, and simultaneously, S is used as the end point of splitting news_iReturning a strip splitting result as the starting point of a new piece of news and turning to process the next shot.

Step 5, if the above conditions are not met, taking the lens S_iJoin this news and go to the next shot.

As shown in fig. 3, which is a schematic structural diagram of an embodiment 1 of a video news splitting apparatus disclosed in the present invention, the apparatus includes:

the decomposition module 301 is configured to decompose a news video to be processed into at least one shot by clustering video frames in the news video to be processed;

A first recording module 302, configured to record a start time point and an end time point of each shot in a news video;

An extracting module 303, configured to extract m key frames of a shot at preset time intervals based on the length of the shot calculated by the start time point and the end time point of the shot;

The analysis module 304 is configured to analyze the m frames of key frames of the shot to obtain different information of the host of the shot;

The second recording module 305 is configured to perform news headline detection on the news video to be processed, and record a start time point and an end time point of a news headline when the news headline is included in the news video to be processed;

A generating module 306, configured to generate news headline marking information for marking the shots based on a start time point and an end time point of the news headline and a start time point and an end time point of the shots in the news video;

The splitting module 307 is configured to split the news video into N pieces of news information according to a preset splitting rule based on a start time point and an end time point of each shot in the news video, anchor category information of the shot, and news title mark information of the shot, where N is greater than or equal to 1.

As shown in fig. 4, which is a schematic structural diagram of an embodiment 2 of a video news splitting apparatus disclosed in the present invention, on the basis of embodiment 1, in this embodiment, news headline detection is performed on a news video to be processed, and when the news video to be processed includes a news headline, after recording a start time point and an end time point of the news headline, the method further includes:

a deduplication module 401, configured to perform deduplication operation on detected news titles of the news video to be processed, and record start time points and end time points of remaining news titles after deduplication;

rect.x＝x+w*R1；

rect.y＝y+h*R2；

rect.w＝w*R3；

rect.h＝h*R4；

r1, R2, R3 and R4 are all preset parameters.

Gray＝R*0.299+G*0.587+B*0.114；

L＝(max(R,G,B)+min(R,G,B))/2。

x(i)＝i*256/N

(3) is obtained in such a way that

The division threshold Th is x (t) corresponding to the maximum t.

where W and H are the width and height of the rect region.

A generating module 402, configured to generate news headline marking information for marking the shots based on the start time point and the end time point of the rest news headlines after the duplication removal and the start time point and the end time point of the shots in the news video;

The splitting module 403 is configured to split the news video into N pieces of news information according to a preset splitting rule based on a start time point and an end time point of each shot in the news video, anchor category information of the shot, and news title mark information of the shot, where N is greater than or equal to 1.

Specifically, in the foregoing embodiment, the analysis module is specifically configured to:

Specifically, in the above embodiment, the second recording module is specifically configured to:

The specific algorithm is as follows:

1. selecting potential candidate regions:

rect.x＝0；

rect.y＝H*cut_ratio；

rect.w＝W；

rect.h＝H*(1-cut_ratio)；

where cut _ ratio is a preset coefficient.

Gray＝R*0.299+G*0.587+B*0.114

L＝(max(R,G,B)+min(R,G,B))/2

and (3) performing convolution with the gray level/brightness image by utilizing a horizontal direction edge gradient operator and a vertical direction edge gradient operator to obtain a horizontal edge image Eh and a vertical edge image Ev, and finally calculating an edge intensity image early, namely, regarding any point early (x, y) on the edge image, the early (x, y) is sqrt (Ev (x, y)2+ Eh (x, y) 2).

(6) For Er, Eg, Eb are compared with a preset threshold value The2, and The edge map is binarized, that is, if Er (x, y) > The2Er (x, y) ═ 1 and else Er (x, y) ═ 0 (for a certain channel example). The2 and The1 may be The same or different, and if The news headline background is of a type of gradual change mode, The edge of The news headline cannot be detected by using a higher threshold, and The edge detected by using a lower threshold needs to be strengthened, so that generally The2< The 1.

2. Tracking the found candidate regions:

track.x＝CandidateRect.x+CandidateRect.w*Xratio1；

track.y＝CandidateRect.y+CandidateRect.h*Yratio1；

track.w＝CandidateRect.w*Xratio2；

track.h＝CandidateRect.h*Yratio2；

xratio1, Xratio2, Yratio1 and Yratio2 are all preset parameters.

Gray＝R*0.299+G*0.587+B*0.114

L＝(max(R,G,B)+min(R,G,B))/2

x(i)＝i*256/N

is obtained in such a way that

The maximum t corresponds to x (t) as the segmentation threshold Thtrack.

(6) A color histogram Href of the image in the tracking area is calculated.

Gray＝R*0.299+G*0.587+B*0.114

L＝(max(R,G,B)+min(R,G,B))/2

where W and H are the width and height of the tracking area image.

3. Determining whether the tracking area is a title area:

Specifically, in the foregoing embodiment, the generating module is specifically configured to:

Specifically, in the above embodiment, the splitting module is specifically configured to:

Step 2, if S_iT in (1)_iBelonging to the two-person host class, then S_i-1T of lens_i-1Is used as the end point of splitting news, and simultaneously, S is used as the end point of splitting news_iAs an independent news item, the starting point of the news is T_iStarting point and end point, returning twoAnd (5) splitting a bar result, setting the news starting point to be null, and turning to process the next shot.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A video news splitting method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the performing news headline detection on the to-be-processed news video, and when a news headline is included in the to-be-processed news video, after recording a start time point and an end time point of the news headline, further comprises:

3. The method of claim 1 or2, wherein the analyzing the m-frame key frames of the shot to obtain the moderator category information of the shot comprises:

4. The method according to claim 1 or2, wherein the performing news headline detection on the to-be-processed news video, and when a news headline is included in the to-be-processed news video, recording a start time point and an end time point of the news headline comprises:

5. The method according to claim 1 or2, wherein the generating news headline marking information marking the footage based on the start time point and the end time point of the news headline and the start time point and the end time point of the footage in the news video comprises:

6. The method of claim 1 or2, wherein the splitting the news video into N pieces of news information according to a preset splitting rule based on a start time point and an end time point of each of the shots in the news video, moderator category information of the shots, and news headline marking information of the shots comprises:

7. A video news splitting apparatus, comprising:

8. The apparatus of claim 7, further comprising:

9. The apparatus according to claim 7 or 8, wherein the analysis module is specifically configured to:

10. The apparatus according to claim 7 or 8, wherein the second recording module is specifically configured to:

11. The apparatus according to claim 7 or 8, wherein the generating module is specifically configured to:

12. The apparatus according to claim 7 or 8, wherein the splitting module is specifically configured to: