CN111709324A - News video strip splitting method based on space-time consistency - Google Patents
News video strip splitting method based on space-time consistency Download PDFInfo
- Publication number
- CN111709324A CN111709324A CN202010473634.4A CN202010473634A CN111709324A CN 111709324 A CN111709324 A CN 111709324A CN 202010473634 A CN202010473634 A CN 202010473634A CN 111709324 A CN111709324 A CN 111709324A
- Authority
- CN
- China
- Prior art keywords
- news video
- news
- video
- cutting
- cut
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000005520 cutting process Methods 0.000 claims abstract description 30
- 238000001514 detection method Methods 0.000 claims abstract description 18
- 238000012545 processing Methods 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 5
- 238000002372 labelling Methods 0.000 abstract description 4
- 238000005516 engineering process Methods 0.000 description 7
- 239000000284 extract Substances 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/48—Matching video sequences
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Human Computer Interaction (AREA)
- Computing Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Television Signal Processing For Recording (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a news video strip splitting method based on space-time consistency. Firstly, marking a news video as a reference system news video; then, carrying out space-time consistency correspondence on the news video to be stripped and the reference system news video, and storing the obtained pre-cutting point and pre-cutting graph; then, deleting the pre-cut points and pictures belonging to the beginning and ending parts of the news video in the pre-cut graph by using face detection to obtain accurate cut points and corresponding cut graphs; and finally, cutting the news video according to the accurate cutting point, and detaching the news video. According to the method, the strip of the news video is removed by adopting a space-time consistency algorithm, so that the strip removing process of the current news video is simplified, and the condition that the labeling data amount of the current news video is insufficient is relieved. Because only a single video needs to be marked manually, the repeated labor is reduced, the accuracy of stripping the news video is improved, and the efficiency of stripping the news video is improved.
Description
Technical Field
The invention relates to the technical field of video processing, video content structuring and video retrieval, in particular to a news video stripping method based on space-time consistency.
Background
With the vigorous development of multimedia technology, video becomes the main form of news media, and under the current fast-paced life, it becomes a habit to quickly acquire news, so that for the traditional news video, how to quickly let people acquire key information in news is a problem to be solved urgently in the news industry at present.
Therefore, news splitting technology comes along, various types of splitting technology are formed, the means of the prior art are summarized and summarized, and the current splitting method comprises two main types: most of the conventional methods focus on the rule-based news stripping method. The news strip splitting method based on the rules is a bottom-up strip splitting method, and mainly comprises the steps of selecting characteristics of news videos, then training a classifier, and finally generating a corresponding strip splitting result; the news strip splitting method based on semantics mainly aims at analyzing high-level semantic features of news videos and splitting low-level features of the high-level semantic features, and is a top-down strip splitting method. The common disadvantage of the two is that the data needs to be marked, which is also a key problem to be solved urgently by the current video processing technology.
One of the existing technologies is a news stripping algorithm based on audio and video features, which extracts the basic features of a news video in vision and audio, namely, the host feature and the audio mute segment feature, and analyzes the features, then extracts the host feature through face recognition, extracts the mute feature by using short-time energy and zero crossing rate, and performs conditional screening on the mute feature, and completes stripping work by combining the two features. The method has the disadvantages that only visual feature and audio feature data are extracted, and text features are not considered, so that the bar splitting accuracy is influenced, and the operation process is complex.
The second prior art is an automatic news stripping method for monitoring massive broadcast television, which automatically obtains audio waveforms and video images of news programs by initializing broadcast television data; extracting audio and video characteristics of news data, including host detection, subtitle detection and tracking and voice detection; acquiring visual candidate points and voice candidate points of a news item boundary through a heuristic rule; positioning calculation of news item boundaries is achieved according to audio and video fusion; and after the processing result provided by the step is manually checked, the result is input into a knowledge base to serve as a knowledge resource for supporting the supervision requirement. The method has the defects that the news stripping step is too complicated and fussy, and the efficiency of video stripping is greatly reduced.
The invention discloses a news video program segmentation method, a news video cataloging method and a system, which are characterized in that characteristic information such as a leader, a news title, presenter characteristic information, shot change, a mute point of audio, a switching point, a pitch period mutation point and the like of a news video are detected, and a detection result is arranged according to a time sequence to obtain an event sequence according to the characteristic information; adopting a preset symbol set and a production rule to reduce the event sequence, and further judging the rough position of the start point and the stop point of each news segment in the event sequence; and calculating the joint posterior probability of the initial positions of the news segments near the rough initial position according to the event sequence, selecting the moment with the maximum posterior probability as the accurate initial position of the news segments, and segmenting the news video to obtain each news video segment. Although the method is used for jointly detecting various characteristic information, the efficiency of video striping is greatly reduced due to the complex process of extracting various characteristics.
Disclosure of Invention
The invention aims to overcome the defects of the existing method and provides a news video strip splitting method based on space-time consistency. The method solves the main problems that (1) in the video strip splitting technology, the extraction of features is difficult and the combination of features is complicated, so that the strip splitting result is inaccurate and the strip splitting process is time-consuming; (2) the problem that a large amount of labeled data is lacked in the video striping technology is how to alleviate the problem, which is one of the problems to be solved urgently in the field of video processing at present.
In order to solve the above problems, the present invention provides a news video striping method based on spatiotemporal consistency, wherein the method comprises:
marking a randomly selected news video to obtain a reference system news video;
carrying out space-time consistency correspondence on the news video to be stripped and the reference system news video, namely carrying out similarity matching and double-threshold detection frame by frame, wherein the similarity is similar when the similarity is greater than a set threshold A, and then storing the obtained pre-cutting points and pre-cutting graphs when the similarity is greater than a set threshold B (B > A);
deleting the pre-cut points and pictures belonging to the beginning and ending parts of the news video in the pre-cut graph by using face detection to obtain accurate cut points and corresponding cut graphs;
and cutting the news video according to the accurate cutting points, processing the keywords by utilizing a voice recognition and word segmentation tool, storing the cutting points and the corresponding voice texts, and stripping the news video.
Preferably, the pre-cut points and pictures belonging to the beginning and ending parts of the news video in the pre-cut graph are deleted and selected by using the face detection, so as to obtain accurate cut points and corresponding cut graphs, specifically:
deleting and selecting by adopting face detection by utilizing the characteristic that the start and end pictures of the news video have stronger similarity, and clearing the previous stored data when the number of faces appearing for the first time is equal to 2; and when the number of the faces appearing for the second time is equal to 2, emptying the stored data, thereby obtaining an accurate cutting point and a corresponding cutting graph.
According to the news video strip splitting method based on the space-time consistency, the news video strip splitting is performed by adopting a space-time consistency algorithm, so that the current news video strip splitting process is simplified, and the condition that the labeling data amount of the current news video is insufficient is relieved. Because only a single video needs to be marked manually, the repeated labor is reduced, the accuracy of stripping the news video is improved, and the efficiency of stripping the news video is improved.
Drawings
FIG. 1 is a general flow chart of a news video stripping method based on spatiotemporal consistency according to an embodiment of the present invention;
FIG. 2 is a flow chart of a spatiotemporal consistency algorithm of an embodiment of the present invention;
fig. 3 is a flowchart of obtaining accurate cutting points and cutting maps by using face detection according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a general flowchart of a news video striping method based on spatiotemporal consistency according to an embodiment of the present invention, as shown in fig. 1, the method includes:
s1, labeling a randomly selected news video to obtain a reference system news video;
s2, carrying out space-time consistency correspondence on the news video to be split and the reference system news video, namely carrying out similarity matching and double-threshold detection frame by frame, wherein the similarity is similar when the similarity is greater than a set threshold A, and the obtained pre-cutting point and pre-cutting graph are stored when the similarity is greater than a set threshold B (B > A);
s3, deleting the pre-cut points and pictures belonging to the beginning and ending parts of the news video in the pre-cut graph by using face detection to obtain accurate cut points and corresponding cut graphs;
and S4, cutting the news video according to the accurate cutting points, processing the keywords by using a voice recognition and word segmentation tool, storing the cutting points and the corresponding voice texts, and stripping the news video.
Step S2, as shown in fig. 2, is as follows:
the reference system news video output by the S1 is recorded as v0, then the news video to be split is selected and recorded as v1, then space-time consistency correspondence is carried out, namely v0 and v1 are synchronously read frame by frame, similarity matching and double-threshold detection are carried out by adopting a difference value hash algorithm, similarity is similar when the similarity is greater than a set threshold value of 0.6, and a pre-cut graph and a cut point are stored when the similarity is greater than a set threshold value of 0.8.
Step S3, as shown in fig. 3, is as follows:
deleting and selecting by adopting a dlib face detection algorithm by utilizing the characteristic that the initial and final pictures of the news video have stronger similarity, and clearing the previous saved data when the number of faces appearing for the first time is equal to 2; and when the number of the faces appearing for the second time is equal to 2, emptying the stored data, thereby obtaining an accurate cutting point and a corresponding cutting graph.
Step S4 is specifically as follows:
and cutting the news video according to the cutting points by using FFmpeg, performing voice recognition by using a science news flying API (application program interface), processing the keywords by using a jieba word segmentation tool, storing the cutting points and the corresponding voice texts, and splitting the news video.
According to the news video strip splitting method based on the space-time consistency, the strip splitting of the news video is performed by adopting the space-time consistency algorithm, so that the strip splitting process of the current news video is simplified, and the condition that the labeling data amount of the current news video is insufficient is relieved. Because only a single video needs to be marked manually, the repeated labor is reduced, the accuracy of stripping the news video is improved, and the efficiency of stripping the news video is improved.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, or the like.
In addition, the above detailed description is given to a news video striping method based on spatiotemporal consistency, and a specific example is applied in the present disclosure to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (2)
1. A news video striping method based on space-time consistency is characterized by comprising the following steps:
marking a randomly selected news video to obtain a reference system news video;
carrying out space-time consistency correspondence on the news video to be stripped and the reference system news video, namely carrying out similarity matching and double-threshold detection frame by frame, wherein the similarity is similar when the similarity is greater than a set threshold A, and then storing the obtained pre-cutting points and pre-cutting graphs when the similarity is greater than a set threshold B (B > A);
deleting the pre-cut points and pictures belonging to the beginning and ending parts of the news video in the pre-cut graph by using face detection to obtain accurate cut points and corresponding cut graphs;
and cutting the news video according to the accurate cutting points, processing the keywords by utilizing a voice recognition and word segmentation tool, storing the cutting points and the corresponding voice texts, and stripping the news video.
2. The method for breaking news video strips based on spatiotemporal consistency of claim 1, wherein the pre-cut points and the pictures belonging to the beginning and ending parts of the news video in the pre-cut graph are deleted and selected by using face detection to obtain accurate cut points and corresponding cut graphs, and specifically:
deleting and selecting by adopting face detection by utilizing the characteristic that the start and end pictures of the news video have stronger similarity, and clearing the previous stored data when the number of faces appearing for the first time is equal to 2; and when the number of the faces appearing for the second time is equal to 2, emptying the stored data, thereby obtaining an accurate cutting point and a corresponding cutting graph.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010473634.4A CN111709324A (en) | 2020-05-29 | 2020-05-29 | News video strip splitting method based on space-time consistency |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010473634.4A CN111709324A (en) | 2020-05-29 | 2020-05-29 | News video strip splitting method based on space-time consistency |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111709324A true CN111709324A (en) | 2020-09-25 |
Family
ID=72538247
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010473634.4A Pending CN111709324A (en) | 2020-05-29 | 2020-05-29 | News video strip splitting method based on space-time consistency |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111709324A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112565820A (en) * | 2020-12-24 | 2021-03-26 | 新奥特(北京)视频技术有限公司 | Video news splitting method and device |
CN113807085A (en) * | 2021-11-19 | 2021-12-17 | 成都索贝数码科技股份有限公司 | Method for extracting title and subtitle aiming at news scene |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101616264A (en) * | 2008-06-27 | 2009-12-30 | 中国科学院自动化研究所 | News video categorization and system |
CN106162223A (en) * | 2016-05-27 | 2016-11-23 | 北京奇虎科技有限公司 | A kind of news video cutting method and device |
CN110267061A (en) * | 2019-04-30 | 2019-09-20 | 新华智云科技有限公司 | A kind of news demolition method and system |
CN110881115A (en) * | 2019-12-24 | 2020-03-13 | 新华智云科技有限公司 | Strip splitting method and system for conference video |
-
2020
- 2020-05-29 CN CN202010473634.4A patent/CN111709324A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101616264A (en) * | 2008-06-27 | 2009-12-30 | 中国科学院自动化研究所 | News video categorization and system |
CN106162223A (en) * | 2016-05-27 | 2016-11-23 | 北京奇虎科技有限公司 | A kind of news video cutting method and device |
CN110267061A (en) * | 2019-04-30 | 2019-09-20 | 新华智云科技有限公司 | A kind of news demolition method and system |
CN110881115A (en) * | 2019-12-24 | 2020-03-13 | 新华智云科技有限公司 | Strip splitting method and system for conference video |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112565820A (en) * | 2020-12-24 | 2021-03-26 | 新奥特(北京)视频技术有限公司 | Video news splitting method and device |
CN112565820B (en) * | 2020-12-24 | 2023-03-28 | 新奥特(北京)视频技术有限公司 | Video news splitting method and device |
CN113807085A (en) * | 2021-11-19 | 2021-12-17 | 成都索贝数码科技股份有限公司 | Method for extracting title and subtitle aiming at news scene |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11776267B2 (en) | Intelligent cataloging method for all-media news based on multi-modal information fusion understanding | |
CN111931775B (en) | Method, system, computer device and storage medium for automatically acquiring news headlines | |
US10304458B1 (en) | Systems and methods for transcribing videos using speaker identification | |
US8503523B2 (en) | Forming a representation of a video item and use thereof | |
US8316301B2 (en) | Apparatus, medium, and method segmenting video sequences based on topic | |
KR100707189B1 (en) | Apparatus and method for detecting advertisment of moving-picture, and compter-readable storage storing compter program controlling the apparatus | |
CN110717470B (en) | Scene recognition method and device, computer equipment and storage medium | |
CN106649713B (en) | Movie visualization processing method and system based on content | |
CN111061915B (en) | Video character relation identification method | |
CN113613065A (en) | Video editing method and device, electronic equipment and storage medium | |
CN111083141A (en) | Method, device, server and storage medium for identifying counterfeit account | |
CN113361462B (en) | Method and device for video processing and caption detection model | |
CN112633241B (en) | News story segmentation method based on multi-feature fusion and random forest model | |
Dumont et al. | Automatic story segmentation for tv news video using multiple modalities | |
CN113435438B (en) | Image and subtitle fused video screen plate extraction and video segmentation method | |
CN111709324A (en) | News video strip splitting method based on space-time consistency | |
CN113923479A (en) | Audio and video editing method and device | |
CN114051154A (en) | News video strip splitting method and system | |
CN111414908B (en) | Method and device for recognizing caption characters in video | |
CN116017088A (en) | Video subtitle processing method, device, electronic equipment and storage medium | |
Eickeler et al. | A new approach to content-based video indexing using hidden markov models | |
Haloi et al. | Unsupervised story segmentation and indexing of broadcast news video | |
CN114387589A (en) | Voice supervision data acquisition method and device, electronic equipment and storage medium | |
CN113807085B (en) | Method for extracting title and subtitle aiming at news scene | |
JP4305921B2 (en) | Video topic splitting method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200925 |
|
RJ01 | Rejection of invention patent application after publication |