CN111709324A

CN111709324A - News video strip splitting method based on space-time consistency

Info

Publication number: CN111709324A
Application number: CN202010473634.4A
Authority: CN
Inventors: 周凡; 张富为; 王若梅; 林格
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2020-09-25

Abstract

The invention discloses a news video strip splitting method based on space-time consistency. Firstly, marking a news video as a reference system news video; then, carrying out space-time consistency correspondence on the news video to be stripped and the reference system news video, and storing the obtained pre-cutting point and pre-cutting graph; then, deleting the pre-cut points and pictures belonging to the beginning and ending parts of the news video in the pre-cut graph by using face detection to obtain accurate cut points and corresponding cut graphs; and finally, cutting the news video according to the accurate cutting point, and detaching the news video. According to the method, the strip of the news video is removed by adopting a space-time consistency algorithm, so that the strip removing process of the current news video is simplified, and the condition that the labeling data amount of the current news video is insufficient is relieved. Because only a single video needs to be marked manually, the repeated labor is reduced, the accuracy of stripping the news video is improved, and the efficiency of stripping the news video is improved.

Description

News video strip splitting method based on space-time consistency

Technical Field

The invention relates to the technical field of video processing, video content structuring and video retrieval, in particular to a news video stripping method based on space-time consistency.

Background

With the vigorous development of multimedia technology, video becomes the main form of news media, and under the current fast-paced life, it becomes a habit to quickly acquire news, so that for the traditional news video, how to quickly let people acquire key information in news is a problem to be solved urgently in the news industry at present.

Therefore, news splitting technology comes along, various types of splitting technology are formed, the means of the prior art are summarized and summarized, and the current splitting method comprises two main types: most of the conventional methods focus on the rule-based news stripping method. The news strip splitting method based on the rules is a bottom-up strip splitting method, and mainly comprises the steps of selecting characteristics of news videos, then training a classifier, and finally generating a corresponding strip splitting result; the news strip splitting method based on semantics mainly aims at analyzing high-level semantic features of news videos and splitting low-level features of the high-level semantic features, and is a top-down strip splitting method. The common disadvantage of the two is that the data needs to be marked, which is also a key problem to be solved urgently by the current video processing technology.

One of the existing technologies is a news stripping algorithm based on audio and video features, which extracts the basic features of a news video in vision and audio, namely, the host feature and the audio mute segment feature, and analyzes the features, then extracts the host feature through face recognition, extracts the mute feature by using short-time energy and zero crossing rate, and performs conditional screening on the mute feature, and completes stripping work by combining the two features. The method has the disadvantages that only visual feature and audio feature data are extracted, and text features are not considered, so that the bar splitting accuracy is influenced, and the operation process is complex.

The second prior art is an automatic news stripping method for monitoring massive broadcast television, which automatically obtains audio waveforms and video images of news programs by initializing broadcast television data; extracting audio and video characteristics of news data, including host detection, subtitle detection and tracking and voice detection; acquiring visual candidate points and voice candidate points of a news item boundary through a heuristic rule; positioning calculation of news item boundaries is achieved according to audio and video fusion; and after the processing result provided by the step is manually checked, the result is input into a knowledge base to serve as a knowledge resource for supporting the supervision requirement. The method has the defects that the news stripping step is too complicated and fussy, and the efficiency of video stripping is greatly reduced.

The invention discloses a news video program segmentation method, a news video cataloging method and a system, which are characterized in that characteristic information such as a leader, a news title, presenter characteristic information, shot change, a mute point of audio, a switching point, a pitch period mutation point and the like of a news video are detected, and a detection result is arranged according to a time sequence to obtain an event sequence according to the characteristic information; adopting a preset symbol set and a production rule to reduce the event sequence, and further judging the rough position of the start point and the stop point of each news segment in the event sequence; and calculating the joint posterior probability of the initial positions of the news segments near the rough initial position according to the event sequence, selecting the moment with the maximum posterior probability as the accurate initial position of the news segments, and segmenting the news video to obtain each news video segment. Although the method is used for jointly detecting various characteristic information, the efficiency of video striping is greatly reduced due to the complex process of extracting various characteristics.

Disclosure of Invention

The invention aims to overcome the defects of the existing method and provides a news video strip splitting method based on space-time consistency. The method solves the main problems that (1) in the video strip splitting technology, the extraction of features is difficult and the combination of features is complicated, so that the strip splitting result is inaccurate and the strip splitting process is time-consuming; (2) the problem that a large amount of labeled data is lacked in the video striping technology is how to alleviate the problem, which is one of the problems to be solved urgently in the field of video processing at present.

In order to solve the above problems, the present invention provides a news video striping method based on spatiotemporal consistency, wherein the method comprises:

marking a randomly selected news video to obtain a reference system news video;

carrying out space-time consistency correspondence on the news video to be stripped and the reference system news video, namely carrying out similarity matching and double-threshold detection frame by frame, wherein the similarity is similar when the similarity is greater than a set threshold A, and then storing the obtained pre-cutting points and pre-cutting graphs when the similarity is greater than a set threshold B (B > A);

deleting the pre-cut points and pictures belonging to the beginning and ending parts of the news video in the pre-cut graph by using face detection to obtain accurate cut points and corresponding cut graphs;

and cutting the news video according to the accurate cutting points, processing the keywords by utilizing a voice recognition and word segmentation tool, storing the cutting points and the corresponding voice texts, and stripping the news video.

Preferably, the pre-cut points and pictures belonging to the beginning and ending parts of the news video in the pre-cut graph are deleted and selected by using the face detection, so as to obtain accurate cut points and corresponding cut graphs, specifically:

deleting and selecting by adopting face detection by utilizing the characteristic that the start and end pictures of the news video have stronger similarity, and clearing the previous stored data when the number of faces appearing for the first time is equal to 2; and when the number of the faces appearing for the second time is equal to 2, emptying the stored data, thereby obtaining an accurate cutting point and a corresponding cutting graph.

According to the news video strip splitting method based on the space-time consistency, the news video strip splitting is performed by adopting a space-time consistency algorithm, so that the current news video strip splitting process is simplified, and the condition that the labeling data amount of the current news video is insufficient is relieved. Because only a single video needs to be marked manually, the repeated labor is reduced, the accuracy of stripping the news video is improved, and the efficiency of stripping the news video is improved.

Drawings

FIG. 1 is a general flow chart of a news video stripping method based on spatiotemporal consistency according to an embodiment of the present invention;

FIG. 2 is a flow chart of a spatiotemporal consistency algorithm of an embodiment of the present invention;

fig. 3 is a flowchart of obtaining accurate cutting points and cutting maps by using face detection according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a general flowchart of a news video striping method based on spatiotemporal consistency according to an embodiment of the present invention, as shown in fig. 1, the method includes:

s1, labeling a randomly selected news video to obtain a reference system news video;

s2, carrying out space-time consistency correspondence on the news video to be split and the reference system news video, namely carrying out similarity matching and double-threshold detection frame by frame, wherein the similarity is similar when the similarity is greater than a set threshold A, and the obtained pre-cutting point and pre-cutting graph are stored when the similarity is greater than a set threshold B (B > A);

s3, deleting the pre-cut points and pictures belonging to the beginning and ending parts of the news video in the pre-cut graph by using face detection to obtain accurate cut points and corresponding cut graphs;

and S4, cutting the news video according to the accurate cutting points, processing the keywords by using a voice recognition and word segmentation tool, storing the cutting points and the corresponding voice texts, and stripping the news video.

Step S2, as shown in fig. 2, is as follows:

the reference system news video output by the S1 is recorded as v0, then the news video to be split is selected and recorded as v1, then space-time consistency correspondence is carried out, namely v0 and v1 are synchronously read frame by frame, similarity matching and double-threshold detection are carried out by adopting a difference value hash algorithm, similarity is similar when the similarity is greater than a set threshold value of 0.6, and a pre-cut graph and a cut point are stored when the similarity is greater than a set threshold value of 0.8.

Step S3, as shown in fig. 3, is as follows:

deleting and selecting by adopting a dlib face detection algorithm by utilizing the characteristic that the initial and final pictures of the news video have stronger similarity, and clearing the previous saved data when the number of faces appearing for the first time is equal to 2; and when the number of the faces appearing for the second time is equal to 2, emptying the stored data, thereby obtaining an accurate cutting point and a corresponding cutting graph.

Step S4 is specifically as follows:

and cutting the news video according to the cutting points by using FFmpeg, performing voice recognition by using a science news flying API (application program interface), processing the keywords by using a jieba word segmentation tool, storing the cutting points and the corresponding voice texts, and splitting the news video.

According to the news video strip splitting method based on the space-time consistency, the strip splitting of the news video is performed by adopting the space-time consistency algorithm, so that the strip splitting process of the current news video is simplified, and the condition that the labeling data amount of the current news video is insufficient is relieved. Because only a single video needs to be marked manually, the repeated labor is reduced, the accuracy of stripping the news video is improved, and the efficiency of stripping the news video is improved.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, or the like.

In addition, the above detailed description is given to a news video striping method based on spatiotemporal consistency, and a specific example is applied in the present disclosure to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A news video striping method based on space-time consistency is characterized by comprising the following steps:

marking a randomly selected news video to obtain a reference system news video;

2. The method for breaking news video strips based on spatiotemporal consistency of claim 1, wherein the pre-cut points and the pictures belonging to the beginning and ending parts of the news video in the pre-cut graph are deleted and selected by using face detection to obtain accurate cut points and corresponding cut graphs, and specifically: