CN109005451B - Video strip splitting method based on deep learning - Google Patents
Video strip splitting method based on deep learning Download PDFInfo
- Publication number
- CN109005451B CN109005451B CN201810701351.3A CN201810701351A CN109005451B CN 109005451 B CN109005451 B CN 109005451B CN 201810701351 A CN201810701351 A CN 201810701351A CN 109005451 B CN109005451 B CN 109005451B
- Authority
- CN
- China
- Prior art keywords
- deep learning
- video
- segments
- face
- strip splitting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013135 deep learning Methods 0.000 title claims abstract description 25
- 238000000034 method Methods 0.000 title claims abstract description 20
- 238000005516 engineering process Methods 0.000 claims abstract description 15
- 238000001514 detection method Methods 0.000 claims abstract description 4
- 238000007670 refining Methods 0.000 claims abstract description 4
- 238000003062 neural network model Methods 0.000 claims description 9
- 239000013598 vector Substances 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 3
- 238000011161 development Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 239000002360 explosive Substances 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a video strip splitting method based on deep learning, which comprises the following steps: step 1: initializing video data; step 2: carrying out face detection by using a face recognition technology to obtain time segments with continuous similar faces as candidate stripping segments; and step 3: extracting sound features from the candidate strip splitting segments; and 4, step 4: and refining the strip splitting time points of the candidate strip splitting segments by utilizing a voice recognition technology and the voice characteristics to obtain final strip splitting time points. The invention utilizes the deep learning algorithm to identify the two characteristics of the face and the sound, improves the strip-splitting accuracy, can simultaneously identify the face and the sound of a plurality of video segments, and has extremely high speed. In addition, the deep learning algorithm can intelligently tear strips apart from the video, and the labor input is reduced.
Description
Technical Field
The invention relates to the technical field of media asset management, in particular to a video striping method based on deep learning.
Background
With the digitalization of the whole production process of television programs, networking, informatization and continuous development of television programs, a large amount of multimedia data is accumulated, and the strip-removing technology is developed in response to the situation that massive multimedia resources cannot be deeply developed and utilized and the supervision requirements of China on the television programs are continuously improved. The continuous development of the internet causes the video material amount to show explosive growth, live broadcast, small video, network television programs, mobile multimedia and the like do not carry out complete program broadcasting but need to split or simplify the small video, the fragmentation requirement of users on the internet content is continuously increased, and the split pieces are also widely applied to new media.
The traditional strip splitting method is manual strip splitting, namely manual strip splitting through frame-by-frame preview, and a large amount of manpower input is needed and the efficiency is too low. The prior art is a strip splitting method based on a cloud framework, the efficiency is improved compared with the traditional strip splitting method, the method has great advantages in the aspects of timeliness of content output and software cost, but a large amount of manpower input is required, and manpower is not released from a large amount of low-quality repeated labor.
Disclosure of Invention
In view of this, the invention provides a video strip splitting method based on deep learning, which can reduce the human input in strip splitting work, and is used for solving the problem that a large amount of human input is required in the prior art.
The invention provides a video strip splitting method based on deep learning, which comprises the following steps:
step 1: initializing video data;
step 2: carrying out face detection by using a face recognition technology to obtain time segments with continuous similar faces as candidate stripping segments;
and step 3: extracting sound features from the candidate strip splitting segments;
and 4, step 4: and refining the strip splitting time points of the candidate strip splitting segments by utilizing a voice recognition technology and the voice characteristics to obtain final strip splitting time points.
Optionally, the initializing of the video data in step 1 includes obtaining audio waveform data and image data in the video data.
Optionally, the face recognition technology in step 2 includes: and coding the human face by using a deep learning algorithm, and comparing the similarity of the human face of each image frame in the video data.
Optionally, the voice recognition technology in step 4 includes: and searching sounds with similar characteristics to the extracted sound characteristics in a certain range before and after the strip splitting time point of the candidate strip splitting segment by using a deep learning algorithm.
Optionally, the process of encoding the face by using the deep learning algorithm includes:
training a deep neural network model to enable the deep neural network model to extract features of an input human face;
inputting image data of video data to the deep neural network model, and extracting high-dimensional face features of the image data;
coding is carried out, namely high-dimensional face features are mapped into low-dimensional vectors; and distinguishing whether the human faces in the video data are similar or different according to the low-dimensional vector.
Compared with the prior art, the invention has the following advantages: the invention utilizes the deep learning algorithm to identify the two characteristics of the face and the sound, improves the strip-splitting accuracy, can simultaneously identify the face and the sound of a plurality of video segments, and has extremely high speed. In addition, the deep learning algorithm can intelligently tear strips apart from the video, and the labor input is reduced.
Drawings
FIG. 1 is a flowchart of a video striping method based on deep learning according to the present invention.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, but the present invention is not limited to only these embodiments. The invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention.
In the following description of the preferred embodiments of the present invention, specific details are set forth in order to provide a thorough understanding of the present invention, and it will be apparent to those skilled in the art that the present invention may be practiced without these specific details.
The invention is described in more detail in the following paragraphs by way of example with reference to the accompanying drawings. It should be noted that the drawings are in simplified form and are not to precise scale, which is only used for convenience and clarity to assist in describing the embodiments of the present invention.
The invention provides a video strip splitting method based on deep learning, which comprises the following steps as shown in figure 1:
step 1: initializing video data;
step 2: carrying out face detection by using a face recognition technology to obtain time segments with continuous similar faces as candidate stripping segments;
and step 3: extracting sound features from the candidate strip splitting segments;
and 4, step 4: and refining the strip splitting time points of the candidate strip splitting segments by utilizing a voice recognition technology and the voice characteristics to obtain final strip splitting time points.
The video data initialization in step 1 includes obtaining audio waveform data and image data in the video data.
The face recognition technology in the step 2 comprises the following steps: the face is coded by using a deep learning algorithm, the similarity of the face of each image frame in video data is compared, and a continuous time segment with similar faces is regarded as a stripping segment, so that a plurality of stripping segments can be obtained.
The voice recognition technology in the step 4 comprises the following steps: and searching sounds with similar characteristics to the extracted sound characteristics in a certain range before and after the strip splitting time point of the candidate strip splitting segment by using a deep learning algorithm.
The process of encoding the human face by using the deep learning algorithm comprises the following steps: training a deep neural network model to enable the deep neural network model to extract features of an input human face;
inputting image data of video data to the deep neural network model, and extracting high-dimensional face features of the image data;
coding is carried out, namely high-dimensional face features are mapped into low-dimensional vectors;
and distinguishing whether the human faces in the video data are similar or different according to the low-dimensional vector.
By mapping the image information of multiple faces into a low-dimensional vector, the model can distinguish that two faces are similar or identical.
In an actual process, a video may be analyzed and processed by using a distributed algorithm, and the video is divided into a plurality of segments with a specified number of seconds (e.g., 10 seconds) as a granularity. And then the segments are distributed to an available server to simultaneously detect the human face and the voice, so that the speed is extremely high, and the second-level short video production can be realized.
The above-described embodiments do not limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the above-described embodiments should be included in the protection scope of the technical solution.
Claims (4)
1. A video strip splitting method based on deep learning is characterized by comprising the following steps:
step 1: initializing video data;
step 2: carrying out face detection by using a face recognition technology to obtain time segments with continuous similar faces as candidate stripping segments;
and step 3: extracting sound features from the candidate strip splitting segments;
and 4, step 4: refining the strip splitting time points of the candidate strip splitting segments by utilizing a voice recognition technology and the voice characteristics to obtain final strip splitting time points;
the voice recognition technology in the step 4 comprises the following steps: and searching sounds with similar characteristics to the extracted sound characteristics in a certain range before and after the strip splitting time point of the candidate strip splitting segment by using a deep learning algorithm.
2. The video stripping method based on deep learning of claim 1, characterized in that: the video data initialization in step 1 includes obtaining audio waveform data and image data in the video data.
3. The method for splitting video strips based on deep learning of claim 1, wherein the face recognition technology in step 2 comprises: and coding the human face by using a deep learning algorithm, and comparing the similarity of the human face of each image frame in the video data.
4. The method for splitting video strips based on deep learning of claim 3, wherein the encoding process of the human face by using the deep learning algorithm comprises:
training a deep neural network model to enable the deep neural network model to extract features of an input human face;
inputting image data of video data to the deep neural network model, and extracting high-dimensional face features of the image data;
coding is carried out, namely high-dimensional face features are mapped into low-dimensional vectors;
and distinguishing whether the human faces in the video data are similar or different according to the low-dimensional vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810701351.3A CN109005451B (en) | 2018-06-29 | 2018-06-29 | Video strip splitting method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810701351.3A CN109005451B (en) | 2018-06-29 | 2018-06-29 | Video strip splitting method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109005451A CN109005451A (en) | 2018-12-14 |
CN109005451B true CN109005451B (en) | 2021-07-30 |
Family
ID=64601854
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810701351.3A Active CN109005451B (en) | 2018-06-29 | 2018-06-29 | Video strip splitting method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109005451B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110267061B (en) * | 2019-04-30 | 2021-07-27 | 新华智云科技有限公司 | News splitting method and system |
CN111222499B (en) * | 2020-04-22 | 2020-08-14 | 成都索贝数码科技股份有限公司 | News automatic bar-splitting conditional random field algorithm prediction result back-flow training method |
CN111586494B (en) * | 2020-04-30 | 2022-03-11 | 腾讯科技(深圳)有限公司 | Intelligent strip splitting method based on audio and video separation |
CN113810782B (en) * | 2020-06-12 | 2022-09-27 | 阿里巴巴集团控股有限公司 | Video processing method and device, server and electronic device |
CN112565885B (en) * | 2020-11-30 | 2023-01-06 | 清华珠三角研究院 | Video segmentation method, system, device and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101616264A (en) * | 2008-06-27 | 2009-12-30 | 中国科学院自动化研究所 | News video categorization and system |
WO2013097101A1 (en) * | 2011-12-28 | 2013-07-04 | 华为技术有限公司 | Method and device for analysing video file |
CN103546667A (en) * | 2013-10-24 | 2014-01-29 | 中国科学院自动化研究所 | Automatic news splitting method for volume broadcast television supervision |
CN105931633A (en) * | 2016-05-30 | 2016-09-07 | 深圳市鼎盛智能科技有限公司 | Speech recognition method and system |
CN106228142A (en) * | 2016-07-29 | 2016-12-14 | 西安电子科技大学 | Face verification method based on convolutional neural networks and Bayesian decision |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7555149B2 (en) * | 2005-10-25 | 2009-06-30 | Mitsubishi Electric Research Laboratories, Inc. | Method and system for segmenting videos using face detection |
-
2018
- 2018-06-29 CN CN201810701351.3A patent/CN109005451B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101616264A (en) * | 2008-06-27 | 2009-12-30 | 中国科学院自动化研究所 | News video categorization and system |
WO2013097101A1 (en) * | 2011-12-28 | 2013-07-04 | 华为技术有限公司 | Method and device for analysing video file |
CN103546667A (en) * | 2013-10-24 | 2014-01-29 | 中国科学院自动化研究所 | Automatic news splitting method for volume broadcast television supervision |
CN105931633A (en) * | 2016-05-30 | 2016-09-07 | 深圳市鼎盛智能科技有限公司 | Speech recognition method and system |
CN106228142A (en) * | 2016-07-29 | 2016-12-14 | 西安电子科技大学 | Face verification method based on convolutional neural networks and Bayesian decision |
Also Published As
Publication number | Publication date |
---|---|
CN109005451A (en) | 2018-12-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109005451B (en) | Video strip splitting method based on deep learning | |
CN106921891B (en) | Method and device for displaying video characteristic information | |
WO2019228267A1 (en) | Short video synthesis method and apparatus, and device and storage medium | |
CN106878632B (en) | Video data processing method and device | |
CN108920648B (en) | Cross-modal matching method based on music-image semantic relation | |
CN113590850A (en) | Multimedia data searching method, device, equipment and storage medium | |
CN111488489A (en) | Video file classification method, device, medium and electronic equipment | |
WO2023197979A1 (en) | Data processing method and apparatus, and computer device and storage medium | |
CN112511854A (en) | Live video highlight generation method, device, medium and equipment | |
CN113573161B (en) | Multimedia data processing method, device, equipment and storage medium | |
CN113327603B (en) | Speech recognition method, apparatus, electronic device, and computer-readable storage medium | |
CN111432140B (en) | Method for splitting television news into strips by using artificial neural network | |
CN102073631A (en) | Video news unit dividing method by using association rule technology | |
CN112002328A (en) | Subtitle generating method and device, computer storage medium and electronic equipment | |
CN112804558B (en) | Video splitting method, device and equipment | |
CN111488487A (en) | Advertisement detection method and detection system for all-media data | |
CN113704506A (en) | Media content duplication eliminating method and related device | |
CN110781346A (en) | News production method, system, device and storage medium based on virtual image | |
CN113194332B (en) | Multi-policy-based new advertisement discovery method, electronic device and readable storage medium | |
CN115734024A (en) | Audio data processing method, device, equipment and storage medium | |
CN116737936B (en) | AI virtual personage language library classification management system based on artificial intelligence | |
CN114051154A (en) | News video strip splitting method and system | |
CN111339865A (en) | Method for synthesizing video MV (music video) by music based on self-supervision learning | |
CN116614672A (en) | Method for automatically mixing and cutting video based on text-video retrieval | |
CN111681680B (en) | Method, system, device and readable storage medium for acquiring audio frequency by video recognition object |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A Video Stripping Method Based on Deep Learning Granted publication date: 20210730 Pledgee: Guotou Taikang Trust Co.,Ltd. Pledgor: HANGZHOU XINGXI TECHNOLOGY Co.,Ltd. Registration number: Y2024980020954 |