CN109005451B - Video strip splitting method based on deep learning - Google Patents

Video strip splitting method based on deep learning Download PDF

Info

Publication number
CN109005451B
CN109005451B CN201810701351.3A CN201810701351A CN109005451B CN 109005451 B CN109005451 B CN 109005451B CN 201810701351 A CN201810701351 A CN 201810701351A CN 109005451 B CN109005451 B CN 109005451B
Authority
CN
China
Prior art keywords
deep learning
video
segments
face
strip splitting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810701351.3A
Other languages
Chinese (zh)
Other versions
CN109005451A (en
Inventor
倪攀
姜子琛
彭梅
刘睿
刘宜飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Xingxi Technology Co ltd
Original Assignee
Hangzhou Xingxi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Xingxi Technology Co ltd filed Critical Hangzhou Xingxi Technology Co ltd
Priority to CN201810701351.3A priority Critical patent/CN109005451B/en
Publication of CN109005451A publication Critical patent/CN109005451A/en
Application granted granted Critical
Publication of CN109005451B publication Critical patent/CN109005451B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video strip splitting method based on deep learning, which comprises the following steps: step 1: initializing video data; step 2: carrying out face detection by using a face recognition technology to obtain time segments with continuous similar faces as candidate stripping segments; and step 3: extracting sound features from the candidate strip splitting segments; and 4, step 4: and refining the strip splitting time points of the candidate strip splitting segments by utilizing a voice recognition technology and the voice characteristics to obtain final strip splitting time points. The invention utilizes the deep learning algorithm to identify the two characteristics of the face and the sound, improves the strip-splitting accuracy, can simultaneously identify the face and the sound of a plurality of video segments, and has extremely high speed. In addition, the deep learning algorithm can intelligently tear strips apart from the video, and the labor input is reduced.

Description

Video strip splitting method based on deep learning
Technical Field
The invention relates to the technical field of media asset management, in particular to a video striping method based on deep learning.
Background
With the digitalization of the whole production process of television programs, networking, informatization and continuous development of television programs, a large amount of multimedia data is accumulated, and the strip-removing technology is developed in response to the situation that massive multimedia resources cannot be deeply developed and utilized and the supervision requirements of China on the television programs are continuously improved. The continuous development of the internet causes the video material amount to show explosive growth, live broadcast, small video, network television programs, mobile multimedia and the like do not carry out complete program broadcasting but need to split or simplify the small video, the fragmentation requirement of users on the internet content is continuously increased, and the split pieces are also widely applied to new media.
The traditional strip splitting method is manual strip splitting, namely manual strip splitting through frame-by-frame preview, and a large amount of manpower input is needed and the efficiency is too low. The prior art is a strip splitting method based on a cloud framework, the efficiency is improved compared with the traditional strip splitting method, the method has great advantages in the aspects of timeliness of content output and software cost, but a large amount of manpower input is required, and manpower is not released from a large amount of low-quality repeated labor.
Disclosure of Invention
In view of this, the invention provides a video strip splitting method based on deep learning, which can reduce the human input in strip splitting work, and is used for solving the problem that a large amount of human input is required in the prior art.
The invention provides a video strip splitting method based on deep learning, which comprises the following steps:
step 1: initializing video data;
step 2: carrying out face detection by using a face recognition technology to obtain time segments with continuous similar faces as candidate stripping segments;
and step 3: extracting sound features from the candidate strip splitting segments;
and 4, step 4: and refining the strip splitting time points of the candidate strip splitting segments by utilizing a voice recognition technology and the voice characteristics to obtain final strip splitting time points.
Optionally, the initializing of the video data in step 1 includes obtaining audio waveform data and image data in the video data.
Optionally, the face recognition technology in step 2 includes: and coding the human face by using a deep learning algorithm, and comparing the similarity of the human face of each image frame in the video data.
Optionally, the voice recognition technology in step 4 includes: and searching sounds with similar characteristics to the extracted sound characteristics in a certain range before and after the strip splitting time point of the candidate strip splitting segment by using a deep learning algorithm.
Optionally, the process of encoding the face by using the deep learning algorithm includes:
training a deep neural network model to enable the deep neural network model to extract features of an input human face;
inputting image data of video data to the deep neural network model, and extracting high-dimensional face features of the image data;
coding is carried out, namely high-dimensional face features are mapped into low-dimensional vectors; and distinguishing whether the human faces in the video data are similar or different according to the low-dimensional vector.
Compared with the prior art, the invention has the following advantages: the invention utilizes the deep learning algorithm to identify the two characteristics of the face and the sound, improves the strip-splitting accuracy, can simultaneously identify the face and the sound of a plurality of video segments, and has extremely high speed. In addition, the deep learning algorithm can intelligently tear strips apart from the video, and the labor input is reduced.
Drawings
FIG. 1 is a flowchart of a video striping method based on deep learning according to the present invention.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, but the present invention is not limited to only these embodiments. The invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention.
In the following description of the preferred embodiments of the present invention, specific details are set forth in order to provide a thorough understanding of the present invention, and it will be apparent to those skilled in the art that the present invention may be practiced without these specific details.
The invention is described in more detail in the following paragraphs by way of example with reference to the accompanying drawings. It should be noted that the drawings are in simplified form and are not to precise scale, which is only used for convenience and clarity to assist in describing the embodiments of the present invention.
The invention provides a video strip splitting method based on deep learning, which comprises the following steps as shown in figure 1:
step 1: initializing video data;
step 2: carrying out face detection by using a face recognition technology to obtain time segments with continuous similar faces as candidate stripping segments;
and step 3: extracting sound features from the candidate strip splitting segments;
and 4, step 4: and refining the strip splitting time points of the candidate strip splitting segments by utilizing a voice recognition technology and the voice characteristics to obtain final strip splitting time points.
The video data initialization in step 1 includes obtaining audio waveform data and image data in the video data.
The face recognition technology in the step 2 comprises the following steps: the face is coded by using a deep learning algorithm, the similarity of the face of each image frame in video data is compared, and a continuous time segment with similar faces is regarded as a stripping segment, so that a plurality of stripping segments can be obtained.
The voice recognition technology in the step 4 comprises the following steps: and searching sounds with similar characteristics to the extracted sound characteristics in a certain range before and after the strip splitting time point of the candidate strip splitting segment by using a deep learning algorithm.
The process of encoding the human face by using the deep learning algorithm comprises the following steps: training a deep neural network model to enable the deep neural network model to extract features of an input human face;
inputting image data of video data to the deep neural network model, and extracting high-dimensional face features of the image data;
coding is carried out, namely high-dimensional face features are mapped into low-dimensional vectors;
and distinguishing whether the human faces in the video data are similar or different according to the low-dimensional vector.
By mapping the image information of multiple faces into a low-dimensional vector, the model can distinguish that two faces are similar or identical.
In an actual process, a video may be analyzed and processed by using a distributed algorithm, and the video is divided into a plurality of segments with a specified number of seconds (e.g., 10 seconds) as a granularity. And then the segments are distributed to an available server to simultaneously detect the human face and the voice, so that the speed is extremely high, and the second-level short video production can be realized.
The above-described embodiments do not limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the above-described embodiments should be included in the protection scope of the technical solution.

Claims (4)

1. A video strip splitting method based on deep learning is characterized by comprising the following steps:
step 1: initializing video data;
step 2: carrying out face detection by using a face recognition technology to obtain time segments with continuous similar faces as candidate stripping segments;
and step 3: extracting sound features from the candidate strip splitting segments;
and 4, step 4: refining the strip splitting time points of the candidate strip splitting segments by utilizing a voice recognition technology and the voice characteristics to obtain final strip splitting time points;
the voice recognition technology in the step 4 comprises the following steps: and searching sounds with similar characteristics to the extracted sound characteristics in a certain range before and after the strip splitting time point of the candidate strip splitting segment by using a deep learning algorithm.
2. The video stripping method based on deep learning of claim 1, characterized in that: the video data initialization in step 1 includes obtaining audio waveform data and image data in the video data.
3. The method for splitting video strips based on deep learning of claim 1, wherein the face recognition technology in step 2 comprises: and coding the human face by using a deep learning algorithm, and comparing the similarity of the human face of each image frame in the video data.
4. The method for splitting video strips based on deep learning of claim 3, wherein the encoding process of the human face by using the deep learning algorithm comprises:
training a deep neural network model to enable the deep neural network model to extract features of an input human face;
inputting image data of video data to the deep neural network model, and extracting high-dimensional face features of the image data;
coding is carried out, namely high-dimensional face features are mapped into low-dimensional vectors;
and distinguishing whether the human faces in the video data are similar or different according to the low-dimensional vector.
CN201810701351.3A 2018-06-29 2018-06-29 Video strip splitting method based on deep learning Active CN109005451B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810701351.3A CN109005451B (en) 2018-06-29 2018-06-29 Video strip splitting method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810701351.3A CN109005451B (en) 2018-06-29 2018-06-29 Video strip splitting method based on deep learning

Publications (2)

Publication Number Publication Date
CN109005451A CN109005451A (en) 2018-12-14
CN109005451B true CN109005451B (en) 2021-07-30

Family

ID=64601854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810701351.3A Active CN109005451B (en) 2018-06-29 2018-06-29 Video strip splitting method based on deep learning

Country Status (1)

Country Link
CN (1) CN109005451B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110267061B (en) * 2019-04-30 2021-07-27 新华智云科技有限公司 News splitting method and system
CN111222499B (en) * 2020-04-22 2020-08-14 成都索贝数码科技股份有限公司 News automatic bar-splitting conditional random field algorithm prediction result back-flow training method
CN111586494B (en) * 2020-04-30 2022-03-11 腾讯科技(深圳)有限公司 Intelligent strip splitting method based on audio and video separation
CN113810782B (en) * 2020-06-12 2022-09-27 阿里巴巴集团控股有限公司 Video processing method and device, server and electronic device
CN112565885B (en) * 2020-11-30 2023-01-06 清华珠三角研究院 Video segmentation method, system, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101616264A (en) * 2008-06-27 2009-12-30 中国科学院自动化研究所 News video categorization and system
WO2013097101A1 (en) * 2011-12-28 2013-07-04 华为技术有限公司 Method and device for analysing video file
CN103546667A (en) * 2013-10-24 2014-01-29 中国科学院自动化研究所 Automatic news splitting method for volume broadcast television supervision
CN105931633A (en) * 2016-05-30 2016-09-07 深圳市鼎盛智能科技有限公司 Speech recognition method and system
CN106228142A (en) * 2016-07-29 2016-12-14 西安电子科技大学 Face verification method based on convolutional neural networks and Bayesian decision

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7555149B2 (en) * 2005-10-25 2009-06-30 Mitsubishi Electric Research Laboratories, Inc. Method and system for segmenting videos using face detection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101616264A (en) * 2008-06-27 2009-12-30 中国科学院自动化研究所 News video categorization and system
WO2013097101A1 (en) * 2011-12-28 2013-07-04 华为技术有限公司 Method and device for analysing video file
CN103546667A (en) * 2013-10-24 2014-01-29 中国科学院自动化研究所 Automatic news splitting method for volume broadcast television supervision
CN105931633A (en) * 2016-05-30 2016-09-07 深圳市鼎盛智能科技有限公司 Speech recognition method and system
CN106228142A (en) * 2016-07-29 2016-12-14 西安电子科技大学 Face verification method based on convolutional neural networks and Bayesian decision

Also Published As

Publication number Publication date
CN109005451A (en) 2018-12-14

Similar Documents

Publication Publication Date Title
CN109005451B (en) Video strip splitting method based on deep learning
CN106921891B (en) Method and device for displaying video characteristic information
WO2019228267A1 (en) Short video synthesis method and apparatus, and device and storage medium
CN106878632B (en) Video data processing method and device
CN108920648B (en) Cross-modal matching method based on music-image semantic relation
CN113590850A (en) Multimedia data searching method, device, equipment and storage medium
CN111488489A (en) Video file classification method, device, medium and electronic equipment
WO2023197979A1 (en) Data processing method and apparatus, and computer device and storage medium
CN112511854A (en) Live video highlight generation method, device, medium and equipment
CN113573161B (en) Multimedia data processing method, device, equipment and storage medium
CN113327603B (en) Speech recognition method, apparatus, electronic device, and computer-readable storage medium
CN111432140B (en) Method for splitting television news into strips by using artificial neural network
CN102073631A (en) Video news unit dividing method by using association rule technology
CN112002328A (en) Subtitle generating method and device, computer storage medium and electronic equipment
CN112804558B (en) Video splitting method, device and equipment
CN111488487A (en) Advertisement detection method and detection system for all-media data
CN113704506A (en) Media content duplication eliminating method and related device
CN110781346A (en) News production method, system, device and storage medium based on virtual image
CN113194332B (en) Multi-policy-based new advertisement discovery method, electronic device and readable storage medium
CN115734024A (en) Audio data processing method, device, equipment and storage medium
CN116737936B (en) AI virtual personage language library classification management system based on artificial intelligence
CN114051154A (en) News video strip splitting method and system
CN111339865A (en) Method for synthesizing video MV (music video) by music based on self-supervision learning
CN116614672A (en) Method for automatically mixing and cutting video based on text-video retrieval
CN111681680B (en) Method, system, device and readable storage medium for acquiring audio frequency by video recognition object

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A Video Stripping Method Based on Deep Learning

Granted publication date: 20210730

Pledgee: Guotou Taikang Trust Co.,Ltd.

Pledgor: HANGZHOU XINGXI TECHNOLOGY Co.,Ltd.

Registration number: Y2024980020954