CN108093314B - Video news splitting method and device - Google Patents

Video news splitting method and device Download PDF

Info

Publication number
CN108093314B
CN108093314B CN201711371733.6A CN201711371733A CN108093314B CN 108093314 B CN108093314 B CN 108093314B CN 201711371733 A CN201711371733 A CN 201711371733A CN 108093314 B CN108093314 B CN 108093314B
Authority
CN
China
Prior art keywords
news
time point
video
shot
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711371733.6A
Other languages
Chinese (zh)
Other versions
CN108093314A (en
Inventor
刘楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201711371733.6A priority Critical patent/CN108093314B/en
Publication of CN108093314A publication Critical patent/CN108093314A/en
Application granted granted Critical
Publication of CN108093314B publication Critical patent/CN108093314B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • G06F16/739Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The invention discloses a video news splitting method and a device, comprising the following steps: decomposing a news video to be processed into at least one shot, recording the starting time point and the ending time point of each shot in the news video, extracting m frames of key frames of the shot according to a preset time interval, analyzing the m frames of key frames of the shot to obtain information of a host of the shot, and recording the starting time point and the ending time point of a news title; and generating news title marking information for marking the shots, and splitting the news video into N pieces of news information according to a preset splitting rule based on the starting time point and the ending time point of each shot in the news video, the host category information of the shots and the news title marking information of the shots. The video news splitting method and the video news splitting device can automatically split the video news based on the host information and the news title information in the video news, and improve the video news splitting efficiency.

Description

Video news splitting method and device
Technical Field
The invention relates to the technical field of video processing, in particular to a video news splitting method and device.
Background
The video news contains a large amount of latest information, and has important value for video websites and news applications. The application of video websites or news needs to split and get online the whole video news broadcasted every day, so that the user can click and watch each piece of news interested in the news. Because of the large number of television stations in the country, various local stations exist besides the satellite television stations, and if all video news need to be segmented, a large amount of manpower is consumed for segmentation. Meanwhile, due to the timeliness of news and the strict requirement on the speed of dividing the video news, more pressure is brought to manual division, the video news is broadcasted in a large amount in a certain time period (such as 12 o 'clock to 12 o' clock half of noon), in order to guarantee the timeliness, the whole video news program needs to be divided into independent news items as soon as possible within a specified time, and the production cannot be carried out in a backlog task post-processing mode.
In summary, in the prior art, there is no technical scheme that can automatically segment video news, and a large number of workers are often required to segment the video news manually, which results in high labor cost. Therefore, how to rapidly and effectively automatically split video news is an urgent problem to be solved.
Disclosure of Invention
In view of this, an object of the present invention is to provide a video news splitting method, which can automatically split video news based on host information and news headline information in the video news, and improve the efficiency of splitting the video news.
In order to achieve the purpose, the invention provides the following technical scheme: a video news splitting method, the method comprising:
clustering video frames in a news video to be processed, and decomposing the news video to be processed into at least one shot;
recording a starting time point and an ending time point of each shot in the news video;
extracting m frames of key frames of the shot according to a preset time interval based on the length of the shot calculated by the starting time point and the ending time point of the shot;
analyzing m frames of key frames of the shot to obtain the host identity information of the shot;
detecting news titles of the news videos to be processed, and recording starting time points and ending time points of the news titles when the news videos to be processed contain the news titles;
generating news title marking information for marking the shots based on the starting time point and the ending time point of the news titles and the starting time point and the ending time point of the shots in the news video;
splitting the news video into N pieces of news information according to a preset splitting rule based on the starting time point and the ending time point of each shot in the news video, the host category information of the shot and the news title mark information of the shot, wherein N is larger than or equal to 1.
Preferably, the performing news title detection on the to-be-processed news video, and when the to-be-processed news video includes a news title, after recording a start time point and an end time point of the news title, further includes:
carrying out duplicate removal operation on the detected news headlines of the news video to be processed, and recording the starting time points and the ending time points of the residual news headlines after duplicate removal;
correspondingly, the generating news headline marking information for marking the shots based on the start time point and the end time point of the news headline and the start time point and the end time point of the shots in the news video comprises:
and generating news title marking information for marking the shot based on the starting time point and the ending time point of the rest news titles after the duplication removal and the starting time point and the ending time point of the shot in the news video.
Preferably, the analyzing the m key frames of the shot to obtain the moderator category information of the shot includes:
inputting each frame of key frame of the shot into a classifier formed by pre-training respectively, and generating a host classification category corresponding to each frame of key frame;
and counting the host classification categories of all key frames of the lens, and determining the host classification category with the largest number as host classification information of the lens.
Preferably, the performing news title detection on the to-be-processed news video, and when the to-be-processed news video includes a news title, recording a start time point and an end time point of the news title includes:
determining a preset area of a video frame of the news video to be processed as a candidate area;
tracking the images in the candidate areas to generate tracking processing results;
and judging whether the candidate area is a news title area or not based on the tracking processing result, if so, determining the appearance time point of the news title area as the starting time point of the news title, and determining the disappearance time point of the news title area as the ending time point of the news title.
Preferably, the generating news headline marking information for marking the footage based on the start time point and the end time point of the news headline and the start time point and the end time point of the footage in the news video includes:
comparing the starting time point and the ending time point of the news title with the starting time point and the ending time point of the shot in the news video;
generating first news headline marking information when the starting time point and the ending time point of the news headline are contained in a time period formed by the starting time point and the ending time point of the shot in the news video;
and when the starting time point and the ending time point of the news headline are not contained in the time period formed by the starting time point and the ending time point of the shot in the news video, generating second news headline marking information.
Preferably, the splitting the news video into N pieces of news information according to a preset splitting rule based on a start time point and an end time point of each of the shots in the news video, moderator category information of the shots, and news title mark information of the shots includes:
the news video is processed according to an information sequence V ═ SiResolution, where i is 0,1, …, M, Si={Ti,Ai,Ci,Csi},TiA start time point and an end time point in the video, A, representing a shotiRepresenting moderator category information contained in the shot, CiNews headline marking information representing a shot, CsiRepresenting whether it is a new title.
A video news splitting apparatus, comprising:
the decomposition module is used for decomposing the news video to be processed into at least one shot by clustering video frames in the news video to be processed;
the first recording module is used for recording the starting time point and the ending time point of each shot in the news video;
the extraction module is used for extracting m frames of key frames of the shot according to a preset time interval on the basis of the length of the shot calculated by the starting time point and the ending time point of the shot;
the analysis module is used for analyzing the m frames of key frames of the shot to obtain the information of the host of the shot;
the second recording module is used for detecting news titles of the news videos to be processed, and recording the starting time point and the ending time point of the news titles when the news videos to be processed contain the news titles;
a generating module, configured to generate news headline marking information for marking the shots based on a start time point and an end time point of the news headline and a start time point and an end time point of the shots in the news video;
the splitting module is used for splitting the news video into N pieces of news information according to a preset splitting rule based on the starting time point and the ending time point of each shot in the news video, the host type information of the shot and the news title mark information of the shot, wherein N is more than or equal to 1.
Preferably, the apparatus further comprises:
the duplication removing module is used for carrying out duplication removing operation on the detected news headlines of the news video to be processed and recording the starting time points and the ending time points of the rest news headlines after duplication removing;
accordingly, the generation module is configured to: and generating news title marking information for marking the shot based on the starting time point and the ending time point of the rest news titles after the duplication removal and the starting time point and the ending time point of the shot in the news video.
Preferably, the analysis module is specifically configured to:
inputting each frame of key frame of the shot into a classifier formed by pre-training respectively, and generating a host classification category corresponding to each frame of key frame;
and counting the host classification categories of all key frames of the lens, and determining the host classification category with the largest number as host classification information of the lens.
Preferably, the second recording module is specifically configured to:
determining a preset area of a video frame of the news video to be processed as a candidate area;
tracking the images in the candidate areas to generate tracking processing results;
and judging whether the candidate area is a news title area or not based on the tracking processing result, if so, determining the appearance time point of the news title area as the starting time point of the news title, and determining the disappearance time point of the news title area as the ending time point of the news title.
Preferably, the generating module is specifically configured to:
comparing the starting time point and the ending time point of the news title with the starting time point and the ending time point of the shot in the news video;
generating first news headline marking information when the starting time point and the ending time point of the news headline are contained in a time period formed by the starting time point and the ending time point of the shot in the news video;
and when the starting time point and the ending time point of the news headline are not contained in the time period formed by the starting time point and the ending time point of the shot in the news video, generating second news headline marking information.
Preferably, the splitting module is specifically configured to:
the news video is processed according to an information sequence V ═ SiResolution, where i is 0,1, …, M, Si={Ti,Ai,Ci,Csi},TiA start time point and an end time point in the video, A, representing a shotiRepresenting moderator category information contained in the shot, CiNews headline marking information representing a shot, CsiRepresenting whether it is a new title.
From the technical scheme, the invention discloses a video news splitting method, when splitting video news, firstly clustering video frames in a news video to be processed, splitting the news video to be processed into at least one shot, then recording the starting time point and the ending time point of each shot in the news video, extracting m frames of key frames of the shot according to a preset time interval based on the calculated length of the shot at the starting time point and the ending time point of the shot, analyzing the m frames of the key frames of the shot to obtain the moderator category information of the shot, carrying out news title detection on the news video to be processed, when the news video to be processed contains the news title, recording the starting time point and the ending time point of the news title, and based on the starting time point and the ending time point of the news title and the starting time point and the ending time point of the shot in the news video, generating news title marking information for marking the shots, and splitting the news video into N pieces of news information according to a preset splitting rule based on the starting time point and the ending time point of each shot in the news video, the host category information of the shots and the news title marking information of the shots, wherein N is more than or equal to 1. The video news splitting method and the video news splitting device can automatically split the video news based on the host information and the news title information in the video news, and improve the video news splitting efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a video news splitting method disclosed in embodiment 1 of the present invention;
fig. 2 is a flowchart of a video news splitting method disclosed in embodiment 2 of the present invention;
fig. 3 is a schematic structural diagram of a video news splitting apparatus disclosed in embodiment 1 of the present invention;
fig. 4 is a schematic structural diagram of a video news splitting apparatus disclosed in embodiment 2 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, which is a flowchart of an embodiment 1 of a video news splitting method disclosed in the present invention, the method includes the following steps:
s101, clustering video frames in a news video to be processed, and decomposing the news video to be processed into at least one shot;
when video news needs to be split, similar video frames in news videos to be processed are clustered and combined into a shot. When the video is decomposed into shots, calculating Euclidean distance between color histograms Hi of video frames adjacent to a time domain by calculating a color histogram Hi of an RGB space of each video frame of the news video to be processed, and if the Euclidean distance is larger than a preset threshold Th1, considering that the shots are sheared, and recording all the video frames between a starting position and an ending position as one shot; calculating the distance of a color histogram H [ i ] between the current video frame and the video frame of the previous n frames, if the distance is greater than a preset threshold Th2, determining that the shot gradual change occurs at the position, and recording all the video frames between the starting position and the ending position as a shot; a shot is considered still inside a shot if neither shearing nor fading occurs.
S102, recording the starting time point and the ending time point of each shot in a news video;
after the news video to be processed is decomposed into at least one shot, the start time point and the end time point of each shot in the news video to be processed are recorded.
S103, extracting m frames of key frames of the shot according to a preset time interval based on the length of the shot calculated by the starting time point and the ending time point of the shot;
setting the number m of key frames to be extracted according to the length of the shot calculated by the recorded starting time point and ending time point of the shot, wherein the set rule can be described as follows: when the lens length is less than 2s, m is 1, when the lens length is less than 4s, m is 2, when the lens length is less than 10s, m is 3, when the lens length is greater than 10s, m is 4 (the parameters can be adjusted). The m frames are extracted from the shot as representative frames, the gap of the extracted key frames is calculated as (end position-start position)/(m +1), and the video frames are extracted from the shot at the gap interval as key frames.
S104, analyzing m frames of key frames of the shot to obtain the host identity information of the shot;
and then analyzing each key frame respectively to obtain the host type information of the shot.
S105, performing news title detection on the news video to be processed, and recording the starting time point and the ending time point of the news title when the news video to be processed contains the news title;
meanwhile, news headline detection and analysis are carried out on the news video to be processed, whether the news video to be processed contains the news headline or not is judged, and when the news video to be processed contains the news headline, the starting time point and the ending time point of the news headline are recorded.
S106, generating news title marking information for marking the lens based on the starting time point and the ending time point of the news title and the starting time point and the ending time point of the shot in the news video;
and then generating news title marking information for marking the shots according to the recorded start time point and end time point of the news titles and the start time point and end time point of the shots in the news video to be processed, namely marking whether the shots contain the news titles or not.
S107, splitting the news video into N pieces of news information according to a preset splitting rule based on the starting time point and the ending time point of each shot in the news video, the host category information of the shot and the news title mark information of the shot, wherein N is larger than or equal to 1.
And finally, splitting the news video to be processed into N pieces of news information according to the acquired start time point and end time point of each shot in the news video, the anchor category information of the shot and the news title mark information of the shot, wherein N is more than or equal to 1.
In summary, in the above embodiment, when a poster graph of a news video needs to be generated, a target news video is first decomposed into at least one shot by clustering video frames in the target news video, and then a start time point and an end time point of each shot in the target news video are recorded; extracting m frames of key frames of the shot according to a preset time interval based on the length of the shot calculated by the start time point and the end time point of the shot, recording the start time point and the end time point of each key frame in a target news video, respectively processing each key frame to generate host marking information of the key frame, simultaneously detecting the news title of the target news video, recording the start time point and the end time point of the news title when the target news video contains the news title, generating news title marking information for marking the key frame based on the start time point and the end time point of the news title and the start time point and the end time point of the key frame in the target news video, and finally generating a poster graph of the target video based on the host marking information and the news title marking information of all the key frames, the poster graph capable of representing video news content can be automatically generated based on host information and news title information in video news, and the problems that in the prior art, the generation form of the video news poster graph is single and the user experience is poor are effectively solved.
As shown in fig. 2, which is a flowchart of an embodiment 2 of a video news splitting method disclosed by the present invention, on the basis of the above embodiment 1, in this embodiment, news headline detection is performed on a news video to be processed, and when the news video to be processed includes a news headline, after recording a start time point and an end time point of the news headline, the method further includes:
s201, carrying out duplicate removal operation on the detected news headlines of the news video to be processed, and recording the starting time points and the ending time points of the residual news headlines after duplicate removal;
through observation of news data, it can be found that a news item often appears, and the situation of the same news title is repeatedly shown for many times. If the news is divided only by the news headlines appearing once, the news is over-divided, so that the detected news headlines of the news video to be processed can be further subjected to a deduplication operation, and the starting time point and the ending time point of the rest news headlines after deduplication are recorded.
When a deduplication operation is performed on a news headline of a detected news video to be processed, it is assumed that the positions of frames at which the start and end of the nth headline are obtained are t1, t2, and the position in the video frame is CRn (x, y, w, h), for this headline Cn [ t1, t2 ]. The two titles before it are Cn-1[ t3, t4], Cn-2[ t5, t6], respectively, and the positions in the video frame are CRn-1 and CRn-2.
Step 1, comparing the current title Cn with the previous title Cn-1, calculating the proportion of the repeated area in the video, namely calculating the proportion R1 of the repeated area of CRn and CRn-1, if R1> ═ Thr, then considering that the two titles need to be subjected to deduplication comparison, and going to step 2. Otherwise, continuously comparing the Cn with the region of Cn-2 for repeatability R2, if R2> -Thr, considering that the two titles need to be subjected to duplicate removal comparison, and turning to step 2, otherwise, considering that Cn is not the repeated title.
Step 2, for the two input titles, selecting a frame representing the content of each title, for Cn, selecting a video frame at (t1+ t2)/2, and for CRn (x, y, w, h), setting a contrast area rect:
rect.x=x+w*R1;
rect.y=y+h*R2;
rect.w=w*R3;
rect.h=h*R4;
r1, R2, R3 and R4 are all preset parameters.
The image in the video frame rect is selected as IMG1, for Cn-1 (or Cn-2), the video frame at the time of (t3+ t4)/2 (or (t5+ t6)/2) is selected, and the image in the same area rect is selected and recorded as IMG 2.
And 3, converting the two input images from the RGB color space into a gray level/or any brightness color separation space (such as YUV, HSV, HSL and LAB), wherein the gray level space conversion formula is as follows:
Gray=R*0.299+G*0.587+B*0.114;
for the luminance color separation space, taking HSL as an example, the conversion formula of luminance l (luminance) is:
L=(max(R,G,B)+min(R,G,B))/2。
step 4, calculating a segmentation threshold, and calculating the grayscale segmentation threshold by using an OTSU method for the grayscale or brightness image of the IMG1, wherein the OTSU method is described as follows:
(1) it is assumed that the grayscale image I can be divided into N grays (N < ═ 256), for which N grays the N-order grayscale histogram H of the image can be extracted.
(2) For each bit t (0< ═ t < N) in the histogram, the following formula is calculated:
Figure GDA0002526087860000101
Figure GDA0002526087860000102
Figure GDA0002526087860000103
x(i)=i*256/N
(3) is obtained in such a way that
Figure GDA0002526087860000104
The division threshold Th is x (t) corresponding to the maximum t.
And 5, binarizing the images IMG1 and IMG 2. The pixel of the reference binarized image B corresponding to the image IMG1 or IMG2 pixel (x, y) is IfI (x, y) < Th, and B (x, y) ═ 0; ifI (x, y) > -Th and B (x, y) — 255.
Step 6, carrying out point-by-point difference on the binary images B1 and B2 of the IMG1 and the IMG2, and calculating the average value Diff of the difference:
Figure GDA0002526087860000105
where W and H are the width and height of the rect region.
And 7, comparing the Diff with a preset threshold, if the Diff is smaller than the threshold, considering the two titles as the same title, and marking the associated shots in the Cn time range [ t1, t2] as the same subtitles, otherwise, marking the shots as different subtitles.
Correspondingly, S202, generating news title marking information for marking the shots based on the starting time point and the ending time point of the rest news titles after the duplication removal and the starting time point and the ending time point of the shots in the news video;
and then generating news title marking information for marking the shots according to the recorded starting time point and ending time point of the rest news titles after the duplication removal and the starting time point and ending time point of the shots in the news video to be processed, namely whether the shots contain news titles or not is marked.
S203, splitting the news video into N pieces of news information according to a preset splitting rule based on the starting time point and the ending time point of each shot in the news video, the host category information of the shot and the news title mark information of the shot, wherein N is larger than or equal to 1.
And finally, splitting the news video to be processed into N pieces of news information according to the acquired start time point and end time point of each shot in the news video, the anchor category information of the shot and the news title mark information of the shot, wherein N is more than or equal to 1.
In summary, in the above embodiments, on the basis of embodiment 1, the detected news headline of the news video to be processed can be further subjected to a deduplication operation, so that the problem of over-segmentation of news is effectively avoided.
Specifically, in the foregoing embodiment, one implementation manner of analyzing m key frames of a shot to obtain host category information of the shot may be:
and respectively inputting each frame of key frame of the lens into a classifier formed by pre-training, generating a host classification category corresponding to each frame of key frame, counting the host classification categories of all key frames of the lens, and determining the host classification category with the largest number as host classification information of the lens.
Namely, for a plurality of frames of key frames of each shot selected previously, inputting the frames into a pre-trained classifier to classify the host, voting the results of the frames, and selecting the category with the most voting results as the category of the shot.
Wherein, the training process of the classifier is as follows: extracting a certain number of video frames from videos of different channels and different news programs, manually classifying the video frames into four categories (four categories are exemplified and not limited in the present invention) including a double-person presiding posture category, a single-person presiding standing posture category and a non-presiding human, training a corresponding classifier by using a deep learning method, wherein a training module refers to a process of training a network model according to an open-source deep learning network training method and a model structure.
Training process: the specific training process of retraining the model by using the cafe open-source deep learning framework (or training by using other open-source deep learning frameworks) is a BP neural algorithm, namely, when forward transmission is performed, the model is output layer by layer, if the result obtained by the output layer is different from an expected value, the model is reversely transmitted, the weight and the threshold value of the model are updated by applying a gradient descent method according to the error of the model, and the operation is repeated for a plurality of times until the error function reaches the global minimum value, so that the specific algorithm is complex and is not an original algorithm, belongs to a general method, and detailed processes are not repeated. Through the training process, a network model for classification can be obtained.
And (3) a classification process: inputting each key frame obtained by each shot after shot segmentation into a trained model, sequentially performing image convolution, posing and RELU operations according to the same model structure and trained parameters until finally obtaining confidence probability output P1, P2, P3 and P4 of each class of images belonging to a double-host sitting posture class, a single-host standing posture class and a non-host, and selecting the class corresponding to the maximum value as the classification class of the unknown image. Namely for example: p1 is the maximum value among (P1, P2, P3, P4), that this image belongs to the two-seater sitting category. For a shot, counting the number of key frames belonging to each category, and selecting the category with more key frames as the category of the shot.
Specifically, in the above embodiment, the detection of the news headline is performed on the news video to be processed, and when the news headline is included in the news video to be processed, one implementation manner of recording the start time point and the end time point of the news headline may be:
determining a preset area of a video frame of a news video to be processed as a candidate area, and tracking images in the candidate area to generate a tracking result; and judging whether the candidate area is a news title area or not based on the tracking processing result, if so, determining the appearance time point of the news title area as the starting time point of the news title, and determining the disappearance time point of the news title area as the ending time point of the news title.
That is, the idea of the headline detection algorithm is to perform news headline detection based on time domain stability for each video frame of an input news video, and acquire frame numbers of a start frame and an end frame of a news headline appearing in the whole news. And comparing the time position of each shot in the video obtained in the module A with the appearance position of the news title, if the time position of each shot in the video is within the range of the appearance of the title, the shot is considered to have the title, and otherwise, the shot is considered to have no title.
The reason why the judgment is carried out in this way is not carried out by using a way of finding titles in a single image is to distinguish the possible roll titles, the roll titles appearing in news are generally displayed in a style extremely similar to the news titles, and if only one image is judged to be the news title, an error occurs, which affects the generation quality of the poster image.
The specific algorithm is as follows:
1. selecting potential candidate regions:
(1) the method comprises the following steps of selecting images in a bottom area (the bottom area is a position where most news titles appear) of a key frame as to-be-detected images, wherein the purpose of area selection is to reduce the calculated amount and improve the detection precision, and the selection method of the bottom area comprises the following steps:
assuming that the width and height of the key frame is W, H, the position of the bottom region Rect (rect.x, rect.y, rect.w, rect.h) (the coordinates of the start of the rectangular region in the key frame and the width and height of the region) in the image of the key frame is:
rect.x=0;
rect.y=H*cut_ratio;
rect.w=W;
rect.h=H*(1-cut_ratio);
where cut _ ratio is a preset coefficient.
(2) Converting the selected image to be detected into a gray/any brightness color separation space (such as YUV, HSV, HSL and LAB) from an RGB color space, wherein the gray space conversion formula is as follows:
Gray=R*0.299+G*0.587+B*0.114
for the luminance color separation space, taking HSL as an example, the conversion formula of luminance l (luminance) is:
L=(max(R,G,B)+min(R,G,B))/2
(3) for gray scale or brightness images, there are various methods for extracting edge features of the images, such as Sobel operator, Canny operator, etc., and in this embodiment, Sobel operator is taken as an example to explain:
convolving with the gray level/brightness image by utilizing a horizontal direction edge gradient operator and a vertical direction edge gradient operator to obtain a horizontal edge image Eh and a vertical edge image Ev, and finally calculating an edge intensity image early, namely, for any point on the edge image early (x, y), early (x, y) ═ sqrt (Ev (x, y)2+ Eh (x, y)2)
The edge gradient operators in the horizontal direction and the vertical direction take a Sobel operator as an example, and other operators are also applicable:
Figure GDA0002526087860000141
(4) for all, The edge map is compared with a preset threshold value The1, and The edge map is binarized, that is, ifall (x, y) > The 1E (x, y) ═ 1, and else E (x, y) > 0.
(5) And respectively executing 3 operations on each channel of RGB of the image to be detected to obtain edge intensity maps Er, Eg and Eb of the three channels.
(6) For Er, Eg, Eb are compared with a preset threshold value The2, and The edge map is binarized, that is, if Er (x, y) > The2Er (x, y) ═ 1 and else Er (x, y) ═ 0 (for a certain channel example). The2 and The1 may be The same or different, if The news headline background is of a type of gradual change mode, The edge of The news headline cannot be detected by using a higher threshold, and The edge detected by using a lower threshold needs to be strengthened, so that generally The2< The1
(7) The obtained edge image E is edge-enhanced, and E (x, y) | Er (x, y) | Eg (x, y) | Eb (x, y), so as to obtain a final edge image. (5) The reinforcing step (7) is optional, and may be used or not used as required. One channel can be enhanced, and three channels can be enhanced, so that detection failure caused by gradual change of a subtitle area is prevented.
(8) And performing horizontal projection on the final edge map, counting the number Numedge of pixels meeting the following condition in each row i, and if the Numedge is greater than Thnum, setting the histogram H [ i ] to be 1, otherwise, setting the histogram H [ i ] to be 0. The following conditions were: if at least one pixel of the pixel and the upper and lower adjacent pixels has a value of 1, the edge value of the pixel is considered to be 1, and the total number of pixels of which the edge values of the pixels which are continuous left and right are 1 and the continuous length is greater than the threshold value Thlen is counted. (purpose guaranteed has a continuous straight line)
(9) And traversing the histogram H [ i ], wherein H [ i ] ═ 1 line spacing, if the spacing is larger than a threshold value threw, taking the edge image area between the two lines as a first-stage candidate area, and if not, continuing to process the next key frame.
(10) For each first-stage candidate region, counting an edge projection histogram V in the vertical direction, and for i of any column, if the number of edge pixels of the column being 1 is greater than Thv, then V [ i ] is 1, otherwise V [ i ] is 0, and V [0] & & V [ W-1] is forcibly set to 1. In V, a region where V [ i ] & & V [ j ] ═ 1& & V [ k ] k ∈ (i, j) ═ 0& & argmax (i-j) is found as the left and right boundaries of the subtitle region. The original image in this region is selected as the candidate region for the second stage. The method of finding the edge pixels of the columns is the same as the method of finding the edge pixels of the rows.
(11) The left and right boundaries of the candidate area in the second stage are found finely, the original image of the candidate area in the second stage is scanned by a sliding window with a certain length (which can be 32 x 32), a color histogram in each window is calculated, meanwhile, the number numcolor of non-0 bits in the color histogram in the window is counted, and the position of a monochromatic area or a background area with complex color is found, namely, the center position of a window meeting the condition is used as a new vertical boundary by numcolor < Thcolor1| | | numcolor > Thcolor 2.
(12) For the rectangular region candidateRect determined by the method, the judgment is carried out by using constraint conditions, wherein the constraint conditions include but are not limited to that the position information of the starting point of candidateRect needs to be in a certain image range, the height of candidateRect needs to be in a certain range, and the like, and if the constraint conditions are met, the rectangular region candidateRect is considered to be a candidate region of a news title. If the candidate area is not in the tracking, the tracking is carried out to a module B, otherwise, the detection is carried out in the module A all the time.
2. Tracking the found candidate regions:
(1) judging whether the area is tracked for the first time, namely, after the processing of the embodiment at the last time, knowing that no area or a plurality of areas are in tracking or the tracking is finished or the tracking is failed, if an area in tracking exists, comparing the area with the current candidate area, if the two areas have higher contact ratio in position, the area can be known to be in tracking, otherwise, the area is determined to be tracked for the first time, wherein the so-called tracking the area for the first time can mean tracking the area for the first time, and can mean tracking the area again after the tracking is finished for the last time. If the tracking is the first tracking, the step (2) is carried out, and if the tracking is not the first tracking, the method steps of the embodiment are exited.
(2) For the first tracked region, a tracking range in the key frame is set (since the candidate region of the input key frame may contain an additional background region, i.e. a region not containing news headlines, a tracking region needs to be set to improve the tracking accuracy). The setting method comprises the following steps: let the candidate region position of the news headline of the key frame be CandidateRect (x, y, w, h) (the starting point x, y in the key frame and the corresponding width and height w, h), and set the tracking region track (x, y, w, h) as:
track.x=CandidateRect.x+CandidateRect.w*Xratio1;
track.y=CandidateRect.y+CandidateRect.h*Yratio1;
track.w=CandidateRect.w*Xratio2;
track.h=CandidateRect.h*Yratio2;
xratio1, Xratio2, Yratio1 and Yratio2 are all preset parameters.
(3) Selecting an image in the key frame tracking area, converting the image from an RGB color space into a gray/any brightness color separation space (such as YUV, HSV, HSL and LAB), and converting a formula for the gray space into the following steps:
Gray=R*0.299+G*0.587+B*0.114
for the luminance color separation space, taking HSL as an example, the conversion formula of luminance l (luminance) is:
L=(max(R,G,B)+min(R,G,B))/2
(4) calculating a segmentation threshold, and calculating the grayscale segmentation threshold by using an OTSU (over the Top) method for the grayscale or brightness image, wherein the OTSU method is described as follows: it is assumed that the grayscale image I can be divided into N grays (N < ═ 256), for which N grays the N-order grayscale histogram H of the image can be extracted. For each bit t (0< ═ t < N) in the histogram, the following formula is calculated:
Figure GDA0002526087860000161
Figure GDA0002526087860000162
Figure GDA0002526087860000163
x(i)=i*256/N
is obtained in such a way that
Figure GDA0002526087860000164
The maximum t corresponds to x (t) as the segmentation threshold Thtrack.
(5) Binarizing the image, i.e., IfI (x, y) < Thtrack for the pixel (x, y) in the image I and the pixel of the reference binarized image Bref corresponding to the pixel (x, y) in the image I, wherein Bref (x, y) ═ 0; ifI (x, y) > -Thtrack, Bref (x, y) — 255.
(6) A color histogram Href of the image in the tracking area is calculated.
(7) For an input key frame, converting the input key frame from an RGB color space into a gray/or any luminance color separation space (such as YUV, HSV, HSL, LAB), and for the gray space, converting the formula as:
Gray=R*0.299+G*0.587+B*0.114
for the luminance color separation space, taking HSL as an example, the conversion formula of luminance l (luminance) is:
L=(max(R,G,B)+min(R,G,B))/2
(8) selecting a gray level image in a tracking area in a key frame, and carrying out binarization, namely IfI (x, y) < Thtrack, Bcur (x, y) ═ 0, for a pixel (x, y) in an image I and a corresponding pixel of a binarized image B; ifI (x, y) > -Thtrack, Bcur (x, y) -255. Thtrack is the result obtained in step 4 when first traced.
(9) Carrying out point-by-point difference on the binarized image Bcur of the current frame and the reference binarized image Bref, and calculating the average value Diffbinary of the difference:
Figure GDA0002526087860000171
where W and H are the width and height of the tracking area image.
(10) And calculating a color histogram Hcur of the current image in the tracking area, and calculating a distance Diffcolor with Href.
(11) Comparing the obtained Diffbinitial and Diffcolor with a preset threshold, if Diffbinitial < Thibinary & & Diffcolor < Thiolor, returning to the state tracking, tracking a tracking counter tracking _ num + +, otherwise, logging _ num + +; it should be noted that the tracking method based on the color histogram and the binarization may be used only one of them, or may be used in combination.
(12) If lost _ num > Thlost, the tracking end state is returned, and the frame number of the current key frame (the time point when the frame disappears as the news title is recorded) is returned, otherwise, the tracking is returned. The purpose of setting lost _ num is to avoid that individual video signals are interfered, so that images are distorted, and matching fails, and the algorithm is allowed to have a certain number of key frame tracking failures through the setting of lost _ num.
3. Determining whether the tracking area is a title area:
if the tracking of the candidate area is finished, comparing the tracking _ num with a preset threshold value Thtracking _ num, if the tracking _ num > is Thtracking _ num, judging that the image is a news title area, otherwise, judging that the image is a non-news title area.
Specifically, in the above embodiment, one implementation manner of generating the news headline marking information for marking the shots based on the start time point and the end time point of the news headline and the start time point and the end time point of the shots in the news video may be:
comparing the start time point and the end time point of the news title with the start time point and the end time point of the shot in the news video, generating first news title marking information when the start time point and the end time point of the news title are contained in a time period formed by the start time point and the end time point of the shot in the news video, and generating second news title marking information when the start time point and the end time point of the news title are not contained in the time period formed by the start time point and the end time point of the shot in the news video.
Specifically, in the above embodiment, based on the start time point and the end time point of each shot in the news video, the host category information of the shot, and the news title mark information of the shot, one implementation manner of splitting the news video into N pieces of news information according to a preset splitting rule may be:
the news video is processed according to an information sequence V ═ SiResolution, where i is 0,1, …, M, Si={Ti,Ai,Ci,Csi},TiA start time point and an end time point in the video, A, representing a shotiRepresenting moderator category information contained in the shot, CiNews headline marking information representing a shot, CsiRepresenting whether it is a new title.
That is, step 1, for each of the shots, if the news starting point is empty, the shot starting point is set as the news starting point, and the process goes to the next shot, if the news starting point is set, the process goes to step 2.
Step 2, if SiT in (1)iBelonging to the two-person host class, then Si-1T of lensi-1Is used as the end point of splitting news, and simultaneously, S is used as the end point of splitting newsiAs an independent news item, the starting point of the news is TiAnd a starting point and an end point, returning two strip splitting results, setting the news starting point to be null, and turning to process the next shot.
Step 3, if SiT in (1)iBelongs to the class of sitting posture or standing posture of a single host, then S isi-1T of lensi-1Is used as the end point of splitting news, and simultaneously, S is used as the end point of splitting newsiReturning a strip splitting result as the starting point of a new piece of news and turning to process the next shot.
Step 4, if SiT in (1)iBelongs to the non-moderator category, and CiWith subtitles and CsiIf the new caption is, then Si-1T of lensi-1Is used as the end point of splitting news, and simultaneously, S is used as the end point of splitting newsiReturning a strip splitting result as the starting point of a new piece of news and turning to process the next shot.
Step 5, if the above conditions are not met, taking the lens SiJoin this news and go to the next shot.
As shown in fig. 3, which is a schematic structural diagram of an embodiment 1 of a video news splitting apparatus disclosed in the present invention, the apparatus includes:
the decomposition module 301 is configured to decompose a news video to be processed into at least one shot by clustering video frames in the news video to be processed;
when video news needs to be split, similar video frames in news videos to be processed are clustered and combined into a shot. When the video is decomposed into shots, calculating Euclidean distance between color histograms Hi of video frames adjacent to a time domain by calculating a color histogram Hi of an RGB space of each video frame of the news video to be processed, and if the Euclidean distance is larger than a preset threshold Th1, considering that the shots are sheared, and recording all the video frames between a starting position and an ending position as one shot; calculating the distance of a color histogram H [ i ] between the current video frame and the video frame of the previous n frames, if the distance is greater than a preset threshold Th2, determining that the shot gradual change occurs at the position, and recording all the video frames between the starting position and the ending position as a shot; a shot is considered still inside a shot if neither shearing nor fading occurs.
A first recording module 302, configured to record a start time point and an end time point of each shot in a news video;
after the news video to be processed is decomposed into at least one shot, the start time point and the end time point of each shot in the news video to be processed are recorded.
An extracting module 303, configured to extract m key frames of a shot at preset time intervals based on the length of the shot calculated by the start time point and the end time point of the shot;
setting the number m of key frames to be extracted according to the length of the shot calculated by the recorded starting time point and ending time point of the shot, wherein the set rule can be described as follows: when the lens length is less than 2s, m is 1, when the lens length is less than 4s, m is 2, when the lens length is less than 10s, m is 3, when the lens length is greater than 10s, m is 4 (the parameters can be adjusted). The m frames are extracted from the shot as representative frames, the gap of the extracted key frames is calculated as (end position-start position)/(m +1), and the video frames are extracted from the shot at the gap interval as key frames.
The analysis module 304 is configured to analyze the m frames of key frames of the shot to obtain different information of the host of the shot;
and then analyzing each key frame respectively to obtain the host type information of the shot.
The second recording module 305 is configured to perform news headline detection on the news video to be processed, and record a start time point and an end time point of a news headline when the news headline is included in the news video to be processed;
meanwhile, news headline detection and analysis are carried out on the news video to be processed, whether the news video to be processed contains the news headline or not is judged, and when the news video to be processed contains the news headline, the starting time point and the ending time point of the news headline are recorded.
A generating module 306, configured to generate news headline marking information for marking the shots based on a start time point and an end time point of the news headline and a start time point and an end time point of the shots in the news video;
and then generating news title marking information for marking the shots according to the recorded start time point and end time point of the news titles and the start time point and end time point of the shots in the news video to be processed, namely marking whether the shots contain the news titles or not.
The splitting module 307 is configured to split the news video into N pieces of news information according to a preset splitting rule based on a start time point and an end time point of each shot in the news video, anchor category information of the shot, and news title mark information of the shot, where N is greater than or equal to 1.
And finally, splitting the news video to be processed into N pieces of news information according to the acquired start time point and end time point of each shot in the news video, the anchor category information of the shot and the news title mark information of the shot, wherein N is more than or equal to 1.
In summary, in the above embodiment, when a poster graph of a news video needs to be generated, a target news video is first decomposed into at least one shot by clustering video frames in the target news video, and then a start time point and an end time point of each shot in the target news video are recorded; extracting m frames of key frames of the shot according to a preset time interval based on the length of the shot calculated by the start time point and the end time point of the shot, recording the start time point and the end time point of each key frame in a target news video, respectively processing each key frame to generate host marking information of the key frame, simultaneously detecting the news title of the target news video, recording the start time point and the end time point of the news title when the target news video contains the news title, generating news title marking information for marking the key frame based on the start time point and the end time point of the news title and the start time point and the end time point of the key frame in the target news video, and finally generating a poster graph of the target video based on the host marking information and the news title marking information of all the key frames, the poster graph capable of representing video news content can be automatically generated based on host information and news title information in video news, and the problems that in the prior art, the generation form of the video news poster graph is single and the user experience is poor are effectively solved.
As shown in fig. 4, which is a schematic structural diagram of an embodiment 2 of a video news splitting apparatus disclosed in the present invention, on the basis of embodiment 1, in this embodiment, news headline detection is performed on a news video to be processed, and when the news video to be processed includes a news headline, after recording a start time point and an end time point of the news headline, the method further includes:
a deduplication module 401, configured to perform deduplication operation on detected news titles of the news video to be processed, and record start time points and end time points of remaining news titles after deduplication;
through observation of news data, it can be found that a news item often appears, and the situation of the same news title is repeatedly shown for many times. If the news is divided only by the news headlines appearing once, the news is over-divided, so that the detected news headlines of the news video to be processed can be further subjected to a deduplication operation, and the starting time point and the ending time point of the rest news headlines after deduplication are recorded.
When a deduplication operation is performed on a news headline of a detected news video to be processed, it is assumed that the positions of frames at which the start and end of the nth headline are obtained are t1, t2, and the position in the video frame is CRn (x, y, w, h), for this headline Cn [ t1, t2 ]. The two titles before it are Cn-1[ t3, t4], Cn-2[ t5, t6], respectively, and the positions in the video frame are CRn-1 and CRn-2.
Step 1, comparing the current title Cn with the previous title Cn-1, calculating the proportion of the repeated area in the video, namely calculating the proportion R1 of the repeated area of CRn and CRn-1, if R1> ═ Thr, then considering that the two titles need to be subjected to deduplication comparison, and going to step 2. Otherwise, continuously comparing the Cn with the region of Cn-2 for repeatability R2, if R2> -Thr, considering that the two titles need to be subjected to duplicate removal comparison, and turning to step 2, otherwise, considering that Cn is not the repeated title.
Step 2, for the two input titles, selecting a frame representing the content of each title, for Cn, selecting a video frame at (t1+ t2)/2, and for CRn (x, y, w, h), setting a contrast area rect:
rect.x=x+w*R1;
rect.y=y+h*R2;
rect.w=w*R3;
rect.h=h*R4;
r1, R2, R3 and R4 are all preset parameters.
The image in the video frame rect is selected as IMG1, for Cn-1 (or Cn-2), the video frame at the time of (t3+ t4)/2 (or (t5+ t6)/2) is selected, and the image in the same area rect is selected and recorded as IMG 2.
And 3, converting the two input images from the RGB color space into a gray level/or any brightness color separation space (such as YUV, HSV, HSL and LAB), wherein the gray level space conversion formula is as follows:
Gray=R*0.299+G*0.587+B*0.114;
for the luminance color separation space, taking HSL as an example, the conversion formula of luminance l (luminance) is:
L=(max(R,G,B)+min(R,G,B))/2。
step 4, calculating a segmentation threshold, and calculating the grayscale segmentation threshold by using an OTSU method for the grayscale or brightness image of the IMG1, wherein the OTSU method is described as follows:
(1) it is assumed that the grayscale image I can be divided into N grays (N < ═ 256), for which N grays the N-order grayscale histogram H of the image can be extracted.
(2) For each bit t (0< ═ t < N) in the histogram, the following formula is calculated:
Figure GDA0002526087860000231
Figure GDA0002526087860000232
Figure GDA0002526087860000233
x(i)=i*256/N
(3) is obtained in such a way that
Figure GDA0002526087860000234
The division threshold Th is x (t) corresponding to the maximum t.
And 5, binarizing the images IMG1 and IMG 2. The pixel of the reference binarized image B corresponding to the image IMG1 or IMG2 pixel (x, y) is IfI (x, y) < Th, and B (x, y) ═ 0; ifI (x, y) > -Th and B (x, y) — 255.
Step 6, carrying out point-by-point difference on the binary images B1 and B2 of the IMG1 and the IMG2, and calculating the average value Diff of the difference:
Figure GDA0002526087860000235
where W and H are the width and height of the rect region.
And 7, comparing the Diff with a preset threshold, if the Diff is smaller than the threshold, considering the two titles as the same title, and marking the associated shots in the Cn time range [ t1, t2] as the same subtitles, otherwise, marking the shots as different subtitles.
A generating module 402, configured to generate news headline marking information for marking the shots based on the start time point and the end time point of the rest news headlines after the duplication removal and the start time point and the end time point of the shots in the news video;
and then generating news title marking information for marking the shots according to the recorded starting time point and ending time point of the rest news titles after the duplication removal and the starting time point and ending time point of the shots in the news video to be processed, namely whether the shots contain news titles or not is marked.
The splitting module 403 is configured to split the news video into N pieces of news information according to a preset splitting rule based on a start time point and an end time point of each shot in the news video, anchor category information of the shot, and news title mark information of the shot, where N is greater than or equal to 1.
And finally, splitting the news video to be processed into N pieces of news information according to the acquired start time point and end time point of each shot in the news video, the anchor category information of the shot and the news title mark information of the shot, wherein N is more than or equal to 1.
In summary, in the above embodiments, on the basis of embodiment 1, the detected news headline of the news video to be processed can be further subjected to a deduplication operation, so that the problem of over-segmentation of news is effectively avoided.
Specifically, in the foregoing embodiment, the analysis module is specifically configured to:
and respectively inputting each frame of key frame of the lens into a classifier formed by pre-training, generating a host classification category corresponding to each frame of key frame, counting the host classification categories of all key frames of the lens, and determining the host classification category with the largest number as host classification information of the lens.
Namely, for a plurality of frames of key frames of each shot selected previously, inputting the frames into a pre-trained classifier to classify the host, voting the results of the frames, and selecting the category with the most voting results as the category of the shot.
Wherein, the training process of the classifier is as follows: extracting a certain number of video frames from videos of different channels and different news programs, manually classifying the video frames into four categories (four categories are exemplified and not limited in the present invention) including a double-person presiding posture category, a single-person presiding standing posture category and a non-presiding human, training a corresponding classifier by using a deep learning method, wherein a training module refers to a process of training a network model according to an open-source deep learning network training method and a model structure.
Training process: the specific training process of retraining the model by using the cafe open-source deep learning framework (or training by using other open-source deep learning frameworks) is a BP neural algorithm, namely, when forward transmission is performed, the model is output layer by layer, if the result obtained by the output layer is different from an expected value, the model is reversely transmitted, the weight and the threshold value of the model are updated by applying a gradient descent method according to the error of the model, and the operation is repeated for a plurality of times until the error function reaches the global minimum value, so that the specific algorithm is complex and is not an original algorithm, belongs to a general method, and detailed processes are not repeated. Through the training process, a network model for classification can be obtained.
And (3) a classification process: inputting each key frame obtained by each shot after shot segmentation into a trained model, sequentially performing image convolution, posing and RELU operations according to the same model structure and trained parameters until finally obtaining confidence probability output P1, P2, P3 and P4 of each class of images belonging to a double-host sitting posture class, a single-host standing posture class and a non-host, and selecting the class corresponding to the maximum value as the classification class of the unknown image. Namely for example: p1 is the maximum value among (P1, P2, P3, P4), that this image belongs to the two-seater sitting category. For a shot, counting the number of key frames belonging to each category, and selecting the category with more key frames as the category of the shot.
Specifically, in the above embodiment, the second recording module is specifically configured to:
determining a preset area of a video frame of a news video to be processed as a candidate area, and tracking images in the candidate area to generate a tracking result; and judging whether the candidate area is a news title area or not based on the tracking processing result, if so, determining the appearance time point of the news title area as the starting time point of the news title, and determining the disappearance time point of the news title area as the ending time point of the news title.
That is, the idea of the headline detection algorithm is to perform news headline detection based on time domain stability for each video frame of an input news video, and acquire frame numbers of a start frame and an end frame of a news headline appearing in the whole news. And comparing the time position of each shot in the video obtained in the module A with the appearance position of the news title, if the time position of each shot in the video is within the range of the appearance of the title, the shot is considered to have the title, and otherwise, the shot is considered to have no title.
The reason why the judgment is carried out in this way is not carried out by using a way of finding titles in a single image is to distinguish the possible roll titles, the roll titles appearing in news are generally displayed in a style extremely similar to the news titles, and if only one image is judged to be the news title, an error occurs, which affects the generation quality of the poster image.
The specific algorithm is as follows:
1. selecting potential candidate regions:
(1) the method comprises the following steps of selecting images in a bottom area (the bottom area is a position where most news titles appear) of a key frame as to-be-detected images, wherein the purpose of area selection is to reduce the calculated amount and improve the detection precision, and the selection method of the bottom area comprises the following steps:
assuming that the width and height of the key frame is W, H, the position of the bottom region Rect (rect.x, rect.y, rect.w, rect.h) (the coordinates of the start of the rectangular region in the key frame and the width and height of the region) in the image of the key frame is:
rect.x=0;
rect.y=H*cut_ratio;
rect.w=W;
rect.h=H*(1-cut_ratio);
where cut _ ratio is a preset coefficient.
(2) Converting the selected image to be detected into a gray/any brightness color separation space (such as YUV, HSV, HSL and LAB) from an RGB color space, wherein the gray space conversion formula is as follows:
Gray=R*0.299+G*0.587+B*0.114
for the luminance color separation space, taking HSL as an example, the conversion formula of luminance l (luminance) is:
L=(max(R,G,B)+min(R,G,B))/2
(3) for gray scale or brightness images, there are various methods for extracting edge features of the images, such as Sobel operator, Canny operator, etc., and in this embodiment, Sobel operator is taken as an example to explain:
and (3) performing convolution with the gray level/brightness image by utilizing a horizontal direction edge gradient operator and a vertical direction edge gradient operator to obtain a horizontal edge image Eh and a vertical edge image Ev, and finally calculating an edge intensity image early, namely, regarding any point early (x, y) on the edge image, the early (x, y) is sqrt (Ev (x, y)2+ Eh (x, y) 2).
The edge gradient operators in the horizontal direction and the vertical direction take a Sobel operator as an example, and other operators are also applicable:
Figure GDA0002526087860000261
(4) for all, The edge map is compared with a preset threshold value The1, and The edge map is binarized, that is, ifall (x, y) > The 1E (x, y) ═ 1, and else E (x, y) > 0.
(5) And respectively executing 3 operations on each channel of RGB of the image to be detected to obtain edge intensity maps Er, Eg and Eb of the three channels.
(6) For Er, Eg, Eb are compared with a preset threshold value The2, and The edge map is binarized, that is, if Er (x, y) > The2Er (x, y) ═ 1 and else Er (x, y) ═ 0 (for a certain channel example). The2 and The1 may be The same or different, and if The news headline background is of a type of gradual change mode, The edge of The news headline cannot be detected by using a higher threshold, and The edge detected by using a lower threshold needs to be strengthened, so that generally The2< The 1.
(7) The obtained edge image E is edge-enhanced, and E (x, y) | Er (x, y) | Eg (x, y) | Eb (x, y), so as to obtain a final edge image. (5) The reinforcing step (7) is optional, and may be used or not used as required. One channel can be enhanced, and three channels can be enhanced, so that detection failure caused by gradual change of a subtitle area is prevented.
(8) And performing horizontal projection on the final edge map, counting the number Numedge of pixels meeting the following condition in each row i, and if the Numedge is greater than Thnum, setting the histogram H [ i ] to be 1, otherwise, setting the histogram H [ i ] to be 0. The following conditions were: if at least one pixel of the pixel and the upper and lower adjacent pixels has a value of 1, the edge value of the pixel is considered to be 1, and the total number of pixels of which the edge values of the pixels which are continuous left and right are 1 and the continuous length is greater than the threshold value Thlen is counted. (purpose guaranteed has a continuous straight line)
(9) And traversing the histogram H [ i ], wherein H [ i ] ═ 1 line spacing, if the spacing is larger than a threshold value threw, taking the edge image area between the two lines as a first-stage candidate area, and if not, continuing to process the next key frame.
(10) For each first-stage candidate region, counting an edge projection histogram V in the vertical direction, and for i of any column, if the number of edge pixels of the column being 1 is greater than Thv, then V [ i ] is 1, otherwise V [ i ] is 0, and V [0] & & V [ W-1] is forcibly set to 1. In V, a region where V [ i ] & & V [ j ] ═ 1& & V [ k ] k ∈ (i, j) ═ 0& & argmax (i-j) is found as the left and right boundaries of the subtitle region. The original image in this region is selected as the candidate region for the second stage. The method of finding the edge pixels of the columns is the same as the method of finding the edge pixels of the rows.
(11) The left and right boundaries of the candidate area in the second stage are found finely, the original image of the candidate area in the second stage is scanned by a sliding window with a certain length (which can be 32 x 32), a color histogram in each window is calculated, meanwhile, the number numcolor of non-0 bits in the color histogram in the window is counted, and the position of a monochromatic area or a background area with complex color is found, namely, the center position of a window meeting the condition is used as a new vertical boundary by numcolor < Thcolor1| | | numcolor > Thcolor 2.
(12) For the rectangular region candidateRect determined by the method, the judgment is carried out by using constraint conditions, wherein the constraint conditions include but are not limited to that the position information of the starting point of candidateRect needs to be in a certain image range, the height of candidateRect needs to be in a certain range, and the like, and if the constraint conditions are met, the rectangular region candidateRect is considered to be a candidate region of a news title. If the candidate area is not in the tracking, the tracking is carried out to a module B, otherwise, the detection is carried out in the module A all the time.
2. Tracking the found candidate regions:
(1) judging whether the area is tracked for the first time, namely, after the processing of the embodiment at the last time, knowing that no area or a plurality of areas are in tracking or the tracking is finished or the tracking is failed, if an area in tracking exists, comparing the area with the current candidate area, if the two areas have higher contact ratio in position, the area can be known to be in tracking, otherwise, the area is determined to be tracked for the first time, wherein the so-called tracking the area for the first time can mean tracking the area for the first time, and can mean tracking the area again after the tracking is finished for the last time. If the tracking is the first tracking, the step (2) is carried out, and if the tracking is not the first tracking, the method steps of the embodiment are exited.
(2) For the first tracked region, a tracking range in the key frame is set (since the candidate region of the input key frame may contain an additional background region, i.e. a region not containing news headlines, a tracking region needs to be set to improve the tracking accuracy). The setting method comprises the following steps: let the candidate region position of the news headline of the key frame be CandidateRect (x, y, w, h) (the starting point x, y in the key frame and the corresponding width and height w, h), and set the tracking region track (x, y, w, h) as:
track.x=CandidateRect.x+CandidateRect.w*Xratio1;
track.y=CandidateRect.y+CandidateRect.h*Yratio1;
track.w=CandidateRect.w*Xratio2;
track.h=CandidateRect.h*Yratio2;
xratio1, Xratio2, Yratio1 and Yratio2 are all preset parameters.
(3) Selecting an image in the key frame tracking area, converting the image from an RGB color space into a gray/any brightness color separation space (such as YUV, HSV, HSL and LAB), and converting a formula for the gray space into the following steps:
Gray=R*0.299+G*0.587+B*0.114
for the luminance color separation space, taking HSL as an example, the conversion formula of luminance l (luminance) is:
L=(max(R,G,B)+min(R,G,B))/2
(4) calculating a segmentation threshold, and calculating the grayscale segmentation threshold by using an OTSU (over the Top) method for the grayscale or brightness image, wherein the OTSU method is described as follows: it is assumed that the grayscale image I can be divided into N grays (N < ═ 256), for which N grays the N-order grayscale histogram H of the image can be extracted. For each bit t (0< ═ t < N) in the histogram, the following formula is calculated:
Figure GDA0002526087860000291
Figure GDA0002526087860000292
Figure GDA0002526087860000293
x(i)=i*256/N
is obtained in such a way that
Figure GDA0002526087860000294
The maximum t corresponds to x (t) as the segmentation threshold Thtrack.
(5) Binarizing the image, i.e., IfI (x, y) < Thtrack for the pixel (x, y) in the image I and the pixel of the reference binarized image Bref corresponding to the pixel (x, y) in the image I, wherein Bref (x, y) ═ 0; ifI (x, y) > -Thtrack, Bref (x, y) — 255.
(6) A color histogram Href of the image in the tracking area is calculated.
(7) For an input key frame, converting the input key frame from an RGB color space into a gray/or any luminance color separation space (such as YUV, HSV, HSL, LAB), and for the gray space, converting the formula as:
Gray=R*0.299+G*0.587+B*0.114
for the luminance color separation space, taking HSL as an example, the conversion formula of luminance l (luminance) is:
L=(max(R,G,B)+min(R,G,B))/2
(8) selecting a gray level image in a tracking area in a key frame, and carrying out binarization, namely IfI (x, y) < Thtrack, Bcur (x, y) ═ 0, for a pixel (x, y) in an image I and a corresponding pixel of a binarized image B; ifI (x, y) > -Thtrack, Bcur (x, y) -255. Thtrack is the result obtained in step 4 when first traced.
(9) Carrying out point-by-point difference on the binarized image Bcur of the current frame and the reference binarized image Bref, and calculating the average value Diffbinary of the difference:
Figure GDA0002526087860000295
where W and H are the width and height of the tracking area image.
(10) And calculating a color histogram Hcur of the current image in the tracking area, and calculating a distance Diffcolor with Href.
(11) Comparing the obtained Diffbinitial and Diffcolor with a preset threshold, if Diffbinitial < Thibinary & & Diffcolor < Thiolor, returning to the state tracking, tracking a tracking counter tracking _ num + +, otherwise, logging _ num + +; it should be noted that the tracking method based on the color histogram and the binarization may be used only one of them, or may be used in combination.
(12) If lost _ num > Thlost, the tracking end state is returned, and the frame number of the current key frame (the time point when the frame disappears as the news title is recorded) is returned, otherwise, the tracking is returned. The purpose of setting lost _ num is to avoid that individual video signals are interfered, so that images are distorted, and matching fails, and the algorithm is allowed to have a certain number of key frame tracking failures through the setting of lost _ num.
3. Determining whether the tracking area is a title area:
if the tracking of the candidate area is finished, comparing the tracking _ num with a preset threshold value Thtracking _ num, if the tracking _ num > is Thtracking _ num, judging that the image is a news title area, otherwise, judging that the image is a non-news title area.
Specifically, in the foregoing embodiment, the generating module is specifically configured to:
comparing the start time point and the end time point of the news title with the start time point and the end time point of the shot in the news video, generating first news title marking information when the start time point and the end time point of the news title are contained in a time period formed by the start time point and the end time point of the shot in the news video, and generating second news title marking information when the start time point and the end time point of the news title are not contained in the time period formed by the start time point and the end time point of the shot in the news video.
Specifically, in the above embodiment, the splitting module is specifically configured to:
the news video is processed according to an information sequence V ═ SiResolution, where i is 0,1, …, M, Si={Ti,Ai,Ci,Csi},TiA start time point and an end time point in the video, A, representing a shotiRepresenting moderator category information contained in the shot, CiNews headline marking information representing a shot, CsiRepresenting whether it is a new title.
That is, step 1, for each of the shots, if the news starting point is empty, the shot starting point is set as the news starting point, and the process goes to the next shot, if the news starting point is set, the process goes to step 2.
Step 2, if SiT in (1)iBelonging to the two-person host class, then Si-1T of lensi-1Is used as the end point of splitting news, and simultaneously, S is used as the end point of splitting newsiAs an independent news item, the starting point of the news is TiStarting point and end point, returning twoAnd (5) splitting a bar result, setting the news starting point to be null, and turning to process the next shot.
Step 3, if SiT in (1)iBelongs to the class of sitting posture or standing posture of a single host, then S isi-1T of lensi-1Is used as the end point of splitting news, and simultaneously, S is used as the end point of splitting newsiReturning a strip splitting result as the starting point of a new piece of news and turning to process the next shot.
Step 4, if SiT in (1)iBelongs to the non-moderator category, and CiWith subtitles and CsiIf the new caption is, then Si-1T of lensi-1Is used as the end point of splitting news, and simultaneously, S is used as the end point of splitting newsiReturning a strip splitting result as the starting point of a new piece of news and turning to process the next shot.
Step 5, if the above conditions are not met, taking the lens SiJoin this news and go to the next shot.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (12)

1. A video news splitting method is characterized by comprising the following steps:
clustering video frames in a news video to be processed, and decomposing the news video to be processed into at least one shot;
recording a starting time point and an ending time point of each shot in the news video;
extracting m frames of key frames of the shot according to a preset time interval based on the length of the shot calculated by the starting time point and the ending time point of the shot;
analyzing m frames of key frames of the shot to obtain the host identity information of the shot;
detecting news titles of the news videos to be processed, and recording starting time points and ending time points of the news titles when the news videos to be processed contain the news titles;
generating news title marking information for marking the shots based on the starting time point and the ending time point of the news titles and the starting time point and the ending time point of the shots in the news video;
splitting the news video into N pieces of news information according to a preset splitting rule based on the starting time point and the ending time point of each shot in the news video, the host category information of the shot and the news title mark information of the shot, wherein N is larger than or equal to 1.
2. The method according to claim 1, wherein the performing news headline detection on the to-be-processed news video, and when a news headline is included in the to-be-processed news video, after recording a start time point and an end time point of the news headline, further comprises:
carrying out duplicate removal operation on the detected news headlines of the news video to be processed, and recording the starting time points and the ending time points of the residual news headlines after duplicate removal;
correspondingly, the generating news headline marking information for marking the shots based on the start time point and the end time point of the news headline and the start time point and the end time point of the shots in the news video comprises:
and generating news title marking information for marking the shot based on the starting time point and the ending time point of the rest news titles after the duplication removal and the starting time point and the ending time point of the shot in the news video.
3. The method of claim 1 or2, wherein the analyzing the m-frame key frames of the shot to obtain the moderator category information of the shot comprises:
inputting each frame of key frame of the shot into a classifier formed by pre-training respectively, and generating a host classification category corresponding to each frame of key frame;
and counting the host classification categories of all key frames of the lens, and determining the host classification category with the largest number as host classification information of the lens.
4. The method according to claim 1 or2, wherein the performing news headline detection on the to-be-processed news video, and when a news headline is included in the to-be-processed news video, recording a start time point and an end time point of the news headline comprises:
determining a preset area of a video frame of the news video to be processed as a candidate area;
tracking the images in the candidate areas to generate tracking processing results;
and judging whether the candidate area is a news title area or not based on the tracking processing result, if so, determining the appearance time point of the news title area as the starting time point of the news title, and determining the disappearance time point of the news title area as the ending time point of the news title.
5. The method according to claim 1 or2, wherein the generating news headline marking information marking the footage based on the start time point and the end time point of the news headline and the start time point and the end time point of the footage in the news video comprises:
comparing the starting time point and the ending time point of the news title with the starting time point and the ending time point of the shot in the news video;
generating first news headline marking information when the starting time point and the ending time point of the news headline are contained in a time period formed by the starting time point and the ending time point of the shot in the news video;
and when the starting time point and the ending time point of the news headline are not contained in the time period formed by the starting time point and the ending time point of the shot in the news video, generating second news headline marking information.
6. The method of claim 1 or2, wherein the splitting the news video into N pieces of news information according to a preset splitting rule based on a start time point and an end time point of each of the shots in the news video, moderator category information of the shots, and news headline marking information of the shots comprises:
the news video is processed according to an information sequence V ═ SiResolution, where i is 0,1, …, M, Si={Ti,Ai,Ci,Csi},TiA start time point and an end time point in the video, A, representing a shotiRepresenting moderator category information contained in the shot, CiNews headline marking information representing a shot, CsiRepresenting whether it is a new title.
7. A video news splitting apparatus, comprising:
the decomposition module is used for decomposing the news video to be processed into at least one shot by clustering video frames in the news video to be processed;
the first recording module is used for recording the starting time point and the ending time point of each shot in the news video;
the extraction module is used for extracting m frames of key frames of the shot according to a preset time interval on the basis of the length of the shot calculated by the starting time point and the ending time point of the shot;
the analysis module is used for analyzing the m frames of key frames of the shot to obtain the information of the host of the shot;
the second recording module is used for detecting news titles of the news videos to be processed, and recording the starting time point and the ending time point of the news titles when the news videos to be processed contain the news titles;
a generating module, configured to generate news headline marking information for marking the shots based on a start time point and an end time point of the news headline and a start time point and an end time point of the shots in the news video;
the splitting module is used for splitting the news video into N pieces of news information according to a preset splitting rule based on the starting time point and the ending time point of each shot in the news video, the host type information of the shot and the news title mark information of the shot, wherein N is more than or equal to 1.
8. The apparatus of claim 7, further comprising:
the duplication removing module is used for carrying out duplication removing operation on the detected news headlines of the news video to be processed and recording the starting time points and the ending time points of the rest news headlines after duplication removing;
accordingly, the generation module is configured to: and generating news title marking information for marking the shot based on the starting time point and the ending time point of the rest news titles after the duplication removal and the starting time point and the ending time point of the shot in the news video.
9. The apparatus according to claim 7 or 8, wherein the analysis module is specifically configured to:
inputting each frame of key frame of the shot into a classifier formed by pre-training respectively, and generating a host classification category corresponding to each frame of key frame;
and counting the host classification categories of all key frames of the lens, and determining the host classification category with the largest number as host classification information of the lens.
10. The apparatus according to claim 7 or 8, wherein the second recording module is specifically configured to:
determining a preset area of a video frame of the news video to be processed as a candidate area;
tracking the images in the candidate areas to generate tracking processing results;
and judging whether the candidate area is a news title area or not based on the tracking processing result, if so, determining the appearance time point of the news title area as the starting time point of the news title, and determining the disappearance time point of the news title area as the ending time point of the news title.
11. The apparatus according to claim 7 or 8, wherein the generating module is specifically configured to:
comparing the starting time point and the ending time point of the news title with the starting time point and the ending time point of the shot in the news video;
generating first news headline marking information when the starting time point and the ending time point of the news headline are contained in a time period formed by the starting time point and the ending time point of the shot in the news video;
and when the starting time point and the ending time point of the news headline are not contained in the time period formed by the starting time point and the ending time point of the shot in the news video, generating second news headline marking information.
12. The apparatus according to claim 7 or 8, wherein the splitting module is specifically configured to:
the news video is processed according to an information sequence V ═ SiResolution, where i is 0,1, …, M, Si={Ti,Ai,Ci,Csi},TiA start time point and an end time point in the video, A, representing a shotiRepresenting moderator category information contained in the shot, CiNews headline marking information representing a shot, CsiRepresenting whether it is a new title.
CN201711371733.6A 2017-12-19 2017-12-19 Video news splitting method and device Active CN108093314B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711371733.6A CN108093314B (en) 2017-12-19 2017-12-19 Video news splitting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711371733.6A CN108093314B (en) 2017-12-19 2017-12-19 Video news splitting method and device

Publications (2)

Publication Number Publication Date
CN108093314A CN108093314A (en) 2018-05-29
CN108093314B true CN108093314B (en) 2020-09-01

Family

ID=62177211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711371733.6A Active CN108093314B (en) 2017-12-19 2017-12-19 Video news splitting method and device

Country Status (1)

Country Link
CN (1) CN108093314B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111314775B (en) * 2018-12-12 2021-09-07 华为终端有限公司 Video splitting method and electronic equipment
CN110267061B (en) * 2019-04-30 2021-07-27 新华智云科技有限公司 News splitting method and system
CN110610500A (en) * 2019-09-06 2019-12-24 北京信息科技大学 News video self-adaptive strip splitting method based on dynamic semantic features
CN110941594B (en) * 2019-12-16 2023-04-18 北京奇艺世纪科技有限公司 Splitting method and device of video file, electronic equipment and storage medium
CN111277859B (en) * 2020-01-15 2021-12-14 腾讯科技(深圳)有限公司 Method and device for acquiring score, computer equipment and storage medium
CN113810782B (en) * 2020-06-12 2022-09-27 阿里巴巴集团控股有限公司 Video processing method and device, server and electronic device
CN113807085B (en) * 2021-11-19 2022-03-04 成都索贝数码科技股份有限公司 Method for extracting title and subtitle aiming at news scene

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101616264A (en) * 2008-06-27 2009-12-30 中国科学院自动化研究所 News video categorization and system
CN102547139A (en) * 2010-12-30 2012-07-04 北京新岸线网络技术有限公司 Method for splitting news video program, and method and system for cataloging news videos
CN103546667A (en) * 2013-10-24 2014-01-29 中国科学院自动化研究所 Automatic news splitting method for volume broadcast television supervision
CN104778230A (en) * 2015-03-31 2015-07-15 北京奇艺世纪科技有限公司 Video data segmentation model training method, video data segmenting method, video data segmentation model training device and video data segmenting device
CN104780388A (en) * 2015-03-31 2015-07-15 北京奇艺世纪科技有限公司 Video data partitioning method and device
CN107087211A (en) * 2017-03-30 2017-08-22 北京奇艺世纪科技有限公司 A kind of anchor shots detection method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100411340B1 (en) * 2001-03-09 2003-12-18 엘지전자 주식회사 Video browsing system based on article of news video content

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101616264A (en) * 2008-06-27 2009-12-30 中国科学院自动化研究所 News video categorization and system
CN102547139A (en) * 2010-12-30 2012-07-04 北京新岸线网络技术有限公司 Method for splitting news video program, and method and system for cataloging news videos
CN103546667A (en) * 2013-10-24 2014-01-29 中国科学院自动化研究所 Automatic news splitting method for volume broadcast television supervision
CN104778230A (en) * 2015-03-31 2015-07-15 北京奇艺世纪科技有限公司 Video data segmentation model training method, video data segmenting method, video data segmentation model training device and video data segmenting device
CN104780388A (en) * 2015-03-31 2015-07-15 北京奇艺世纪科技有限公司 Video data partitioning method and device
CN107087211A (en) * 2017-03-30 2017-08-22 北京奇艺世纪科技有限公司 A kind of anchor shots detection method and device

Also Published As

Publication number Publication date
CN108093314A (en) 2018-05-29

Similar Documents

Publication Publication Date Title
CN108093314B (en) Video news splitting method and device
CN107977645B (en) Method and device for generating video news poster graph
CN104063883B (en) A kind of monitor video abstraction generating method being combined based on object and key frame
Diem et al. cBAD: ICDAR2017 competition on baseline detection
CN107087211B (en) Method and device for detecting lens of host
US10304458B1 (en) Systems and methods for transcribing videos using speaker identification
CN102332096B (en) Video caption text extraction and identification method
RU2637989C2 (en) Method and device for identifying target object in image
Yang et al. Lecture video indexing and analysis using video ocr technology
CN110267061B (en) News splitting method and system
CN102915544B (en) Video image motion target extracting method based on pattern detection and color segmentation
JP5067310B2 (en) Subtitle area extraction apparatus, subtitle area extraction method, and subtitle area extraction program
CN103336954A (en) Identification method and device of station caption in video
CN106845513B (en) Manpower detector and method based on condition random forest
CN102542268A (en) Method for detecting and positioning text area in video
CN106792005B (en) Content detection method based on audio and video combination
CN103714314B (en) Television video station caption identification method combining edge and color information
CN108256508B (en) News main and auxiliary title detection method and device
CN113435443B (en) Method for automatically identifying landmark from video
CN106570885A (en) Background modeling method based on brightness and texture fusion threshold value
CN112102250A (en) Method for establishing and detecting pathological image detection model with training data as missing label
CN113485615B (en) Method and system for manufacturing typical application intelligent image-text course based on computer vision
CN108446603B (en) News title detection method and device
CN114529894A (en) Rapid scene text detection method fusing hole convolution
CN101827224A (en) Detection method of anchor shot in news video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant