CN109005451A - Video demolition method based on deep learning - Google Patents
Video demolition method based on deep learning Download PDFInfo
- Publication number
- CN109005451A CN109005451A CN201810701351.3A CN201810701351A CN109005451A CN 109005451 A CN109005451 A CN 109005451A CN 201810701351 A CN201810701351 A CN 201810701351A CN 109005451 A CN109005451 A CN 109005451A
- Authority
- CN
- China
- Prior art keywords
- demolition
- face
- deep learning
- video
- segment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013135 deep learning Methods 0.000 title claims abstract description 26
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000005516 engineering process Methods 0.000 claims abstract description 15
- 238000000605 extraction Methods 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims 1
- 238000007670 refining Methods 0.000 abstract description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000030808 detection of mechanical stimulus involved in sensory perception of sound Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 206010016256 fatigue Diseases 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The video demolition method based on deep learning that the invention discloses a kind of, comprising the following steps: step 1: video data initialization;Step 2: carrying out Face datection using face recognition technology, the time slice of similar face continuously occurred as candidate demolition segment;Step 3: in candidate demolition segment, extracting sound characteristic;Step 4: refining the demolition time point of candidate demolition segment using voice recognition technology and the sound characteristic, obtain final demolition time point.Two features of face and sound are identified using deep learning algorithm in the present invention, improve the accuracy of demolition, and can be exceedingly fast simultaneously to multiple video clips progress face and voice recognition, speed.In addition, deep learning algorithm can carry out intelligent demolition to video, reduce the investment of manpower.
Description
Technical field
The present invention relates to media asset management technical fields, more specifically, being related to a kind of based on deep learning
Video demolition method.
Background technique
As TV programme produce full-range digitlization, networking, informationization and the continuous development of TV programme, product
Tired out a large amount of multi-medium data, in face of magnanimity multimedia resource can not deep development and utilization and China to TV programme
Regulatory requirements constantly promoted, demolition technology is come into being.And it interconnects
The continuous development of net, so that explosive growth, live streaming, small video, network TV program, mobile multimedia is presented in video material amount
Deng the instead of complete program broadcast of progress, need to split or simplify small video, user needs the fragmentation of internet content
It asks and is continuously increased, demolition is also more and more widely used in new media.
Traditional demolition method is the i.e. artificial preview craft demolition frame by frame of artificial demolition, needs a large amount of human input and efficiency
It is too low.The prior art is that the demolition method and traditional demolition mode specific efficiency based on cloud framework increase, in content output
Timeliness and software cost in terms of have a biggish advantage, but need a large amount of human input, there is no by manpower from a large amount of low
It is freed in the duplication of labour of quality.
Summary of the invention
In view of this, the present invention provides a kind of video based on deep learning that can reduce human input in demolition work
Demolition method needs a large amount of human input for solving the problems, such as in the prior art.
The video demolition method based on deep learning that the present invention provides a kind of, comprising the following steps:
Step 1: video data initialization;
Step 2: carrying out Face datection using face recognition technology, the time slice of similar face continuously occurred as candidate
Demolition segment;
Step 3: in candidate demolition segment, extracting sound characteristic;
Step 4: refining the demolition time point of candidate demolition segment using voice recognition technology and the sound characteristic, obtain final
Demolition time point.
Optionally, video data initialization includes the audio waveform data and figure obtained in video data in the step 1
As data.
Optionally, the face recognition technology in the step 2 includes: to be encoded using deep learning algorithm to face,
Compare the similitude of each picture frame face in video data.
Optionally, voice recognition technology includes: using deep learning algorithm in candidate demolition segment in the step 4
The sound that there are similar features with the extraction sound characteristic is found before and after demolition time point in a certain range.
Optionally, described to include: to face progress cataloged procedure using deep learning algorithm
Training deep neural network model, can be to the face extraction feature of input;
For the image data of inputting video data to the deep neural network model, the high-dimensional face for extracting image data is special
Sign;
It is encoded, i.e., high-dimensional face characteristic is mapped as to the vector of low dimensional;
According to the vector of low dimensional, distinguish that the face in video data is similar or different.
In the present invention compared with prior art, have the advantage that in the present invention using deep learning algorithm to face and
Two features of sound are identified, improve the accuracy of demolition, and can carry out face and sound to multiple video clips simultaneously
Identification, speed are exceedingly fast.In addition, deep learning algorithm can carry out intelligent demolition to video, reduce the investment of manpower.
Detailed description of the invention
Fig. 1 is that the present invention is based on the flow charts of the video demolition method of deep learning.
Specific embodiment
The preferred embodiment of the present invention is described in detail below in conjunction with attached drawing, but the present invention is not restricted to these
Embodiment.The present invention covers any substitution made in the spirit and scope of the present invention, modification, equivalent method and scheme.
In order to make the public have thorough understanding to the present invention, it is described in detail in the following preferred embodiment of the present invention specific
Details, and the present invention can also be understood completely in description without these details for a person skilled in the art.
The present invention is more specifically described by way of example referring to attached drawing in the following passage.It should be noted that attached drawing is adopted
With more simplified form and using non-accurate ratio, only to facilitate, lucidly aid in illustrating the embodiment of the present invention
Purpose.
The video demolition method based on deep learning that the present invention provides a kind of, as shown in Fig. 1, comprising the following steps:
Step 1: video data initialization;
Step 2: carrying out Face datection using face recognition technology, the time slice of similar face continuously occurred as candidate
Demolition segment;
Step 3: in candidate demolition segment, extracting sound characteristic;
Step 4: refining the demolition time point of candidate demolition segment using voice recognition technology and the sound characteristic, obtain final
Demolition time point.
Video data initialization includes the audio waveform data and image data obtained in video data in the step 1.
Face recognition technology in the step 2 includes: to be encoded using deep learning algorithm to face, compares view
The similitude of frequency each picture frame face in, is considered as a demolition piece for the continuous time segment for similar face occur
Section, therefore available multiple demolition segments.
Voice recognition technology includes: the demolition time using deep learning algorithm in candidate demolition segment in the step 4
The sound that there are similar features with the extraction sound characteristic is found in a certain range of point front and back.
It is described to include: to face progress cataloged procedure using deep learning algorithm
Training deep neural network model, can be to the face extraction feature of input;
For the image data of inputting video data to the deep neural network model, the high-dimensional face for extracting image data is special
Sign;
It is encoded, i.e., high-dimensional face characteristic is mapped as to the vector of low dimensional;
According to the vector of low dimensional, distinguish that the face in video data is similar or different.
By the way that multiple human face image informations are mapped to low dimensional vector, model can identify two faces be it is similar or
It is identical.
In the actual process, video can be analyzed and is handled first with distributed algorithm, by video to specify the second
Number (such as 10 seconds) is granularity, is divided into several segments.These segments are then distributed into available server while carrying out face
With the detection of sound, speed is exceedingly fast, and the second short video production of grade may be implemented.
Embodiments described above does not constitute the restriction to the technical solution protection scope.It is any in above-mentioned implementation
Made modifications, equivalent substitutions and improvements etc., should be included in the protection model of the technical solution within the spirit and principle of mode
Within enclosing.
Claims (5)
1. a kind of video demolition method based on deep learning, which comprises the following steps:
Step 1: video data initialization;
Step 2: carrying out Face datection using face recognition technology, the time of similar face continuously occurred
Segment is as candidate demolition segment;
Step 3: in candidate demolition segment, extracting sound characteristic;
Step 4: the demolition time of candidate demolition segment is refined using voice recognition technology and the sound characteristic
Point obtains final demolition time point.
2. the video demolition method based on deep learning according to claim 1, it is characterised in that:
Video data initialization includes the audio waveform data and picture number obtained in video data in the step 1
According to.
3. the video demolition method based on deep learning according to claim 1, which is characterized in that
Face recognition technology in the step 2 includes: to be encoded using deep learning algorithm to face, than
Compared with the similitude of picture frame face each in video data.
4. the video demolition method based on deep learning according to claim 1, which is characterized in that
Voice recognition technology includes: the demolition using deep learning algorithm in candidate demolition segment in the step 4
The sound that there are similar features with the extraction sound characteristic is found before and after time point in a certain range.
5. the video demolition method according to claim 3 based on deep learning, which is characterized in that
It is described to include: to face progress cataloged procedure using deep learning algorithm
Training deep neural network model, can be to the face extraction feature of input;
The image data of inputting video data extracts the height of image data to the deep neural network model
Dimension face characteristic;
It is encoded, i.e., high-dimensional face characteristic is mapped as to the vector of low dimensional;
According to the vector of low dimensional, distinguish that the face in video data is similar or different.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810701351.3A CN109005451B (en) | 2018-06-29 | 2018-06-29 | Video strip splitting method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810701351.3A CN109005451B (en) | 2018-06-29 | 2018-06-29 | Video strip splitting method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109005451A true CN109005451A (en) | 2018-12-14 |
CN109005451B CN109005451B (en) | 2021-07-30 |
Family
ID=64601854
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810701351.3A Active CN109005451B (en) | 2018-06-29 | 2018-06-29 | Video strip splitting method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109005451B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110267061A (en) * | 2019-04-30 | 2019-09-20 | 新华智云科技有限公司 | A kind of news demolition method and system |
CN111222499A (en) * | 2020-04-22 | 2020-06-02 | 成都索贝数码科技股份有限公司 | News automatic bar-splitting conditional random field algorithm prediction result back-flow training method |
CN111586494A (en) * | 2020-04-30 | 2020-08-25 | 杭州慧川智能科技有限公司 | Intelligent strip splitting method based on audio and video separation |
CN112565885A (en) * | 2020-11-30 | 2021-03-26 | 清华珠三角研究院 | Video segmentation method, system, device and storage medium |
CN113810782A (en) * | 2020-06-12 | 2021-12-17 | 阿里巴巴集团控股有限公司 | Video processing method and device, server and electronic device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070091203A1 (en) * | 2005-10-25 | 2007-04-26 | Peker Kadir A | Method and system for segmenting videos using face detection |
CN101616264A (en) * | 2008-06-27 | 2009-12-30 | 中国科学院自动化研究所 | News video categorization and system |
WO2013097101A1 (en) * | 2011-12-28 | 2013-07-04 | 华为技术有限公司 | Method and device for analysing video file |
CN103546667A (en) * | 2013-10-24 | 2014-01-29 | 中国科学院自动化研究所 | Automatic news splitting method for volume broadcast television supervision |
CN105931633A (en) * | 2016-05-30 | 2016-09-07 | 深圳市鼎盛智能科技有限公司 | Speech recognition method and system |
CN106228142A (en) * | 2016-07-29 | 2016-12-14 | 西安电子科技大学 | Face verification method based on convolutional neural networks and Bayesian decision |
-
2018
- 2018-06-29 CN CN201810701351.3A patent/CN109005451B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070091203A1 (en) * | 2005-10-25 | 2007-04-26 | Peker Kadir A | Method and system for segmenting videos using face detection |
CN101616264A (en) * | 2008-06-27 | 2009-12-30 | 中国科学院自动化研究所 | News video categorization and system |
WO2013097101A1 (en) * | 2011-12-28 | 2013-07-04 | 华为技术有限公司 | Method and device for analysing video file |
CN103546667A (en) * | 2013-10-24 | 2014-01-29 | 中国科学院自动化研究所 | Automatic news splitting method for volume broadcast television supervision |
CN105931633A (en) * | 2016-05-30 | 2016-09-07 | 深圳市鼎盛智能科技有限公司 | Speech recognition method and system |
CN106228142A (en) * | 2016-07-29 | 2016-12-14 | 西安电子科技大学 | Face verification method based on convolutional neural networks and Bayesian decision |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110267061A (en) * | 2019-04-30 | 2019-09-20 | 新华智云科技有限公司 | A kind of news demolition method and system |
CN111222499A (en) * | 2020-04-22 | 2020-06-02 | 成都索贝数码科技股份有限公司 | News automatic bar-splitting conditional random field algorithm prediction result back-flow training method |
CN111222499B (en) * | 2020-04-22 | 2020-08-14 | 成都索贝数码科技股份有限公司 | News automatic bar-splitting conditional random field algorithm prediction result back-flow training method |
CN111586494A (en) * | 2020-04-30 | 2020-08-25 | 杭州慧川智能科技有限公司 | Intelligent strip splitting method based on audio and video separation |
CN111586494B (en) * | 2020-04-30 | 2022-03-11 | 腾讯科技(深圳)有限公司 | Intelligent strip splitting method based on audio and video separation |
CN113810782A (en) * | 2020-06-12 | 2021-12-17 | 阿里巴巴集团控股有限公司 | Video processing method and device, server and electronic device |
CN113810782B (en) * | 2020-06-12 | 2022-09-27 | 阿里巴巴集团控股有限公司 | Video processing method and device, server and electronic device |
CN112565885A (en) * | 2020-11-30 | 2021-03-26 | 清华珠三角研究院 | Video segmentation method, system, device and storage medium |
CN112565885B (en) * | 2020-11-30 | 2023-01-06 | 清华珠三角研究院 | Video segmentation method, system, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109005451B (en) | 2021-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109005451A (en) | Video demolition method based on deep learning | |
US20210006864A1 (en) | Method for displaying live broadcast room, apparatus, device, and storage medium | |
WO2019228267A1 (en) | Short video synthesis method and apparatus, and device and storage medium | |
US20210174592A1 (en) | Augmented reality method and device | |
CN109410911A (en) | Artificial intelligence learning method based on speech recognition | |
CN103365936A (en) | Video recommendation system and method thereof | |
CN105744292A (en) | Video data processing method and device | |
CN114465737B (en) | Data processing method and device, computer equipment and storage medium | |
Yan et al. | Semantic segmentation guided pixel fusion for image retargeting | |
CN105898525A (en) | Method of searching videos in specific video database, and video terminal thereof | |
CN105447147A (en) | Data processing method and apparatus | |
CN107801061A (en) | Ad data matching process, apparatus and system | |
CN103607635A (en) | Method, device and terminal for caption identification | |
CN104036243A (en) | Behavior recognition method based on light stream information | |
CN115515016B (en) | Virtual live broadcast method, system and storage medium capable of realizing self-cross reply | |
CN105718543A (en) | Sentence display method and device | |
CN110881115A (en) | Strip splitting method and system for conference video | |
CN110099303A (en) | A kind of media play system based on artificial intelligence | |
US20240062581A1 (en) | Obtaining artist imagery from video content using facial recognition | |
CN113948105A (en) | Voice-based image generation method, device, equipment and medium | |
CN106205610B (en) | A kind of voice information identification method and equipment | |
CN114390368A (en) | Live video data processing method and device, equipment and readable medium | |
CN105007524A (en) | Video processing method and device | |
US20220375223A1 (en) | Information generation method and apparatus | |
CN110874609B (en) | User clustering method, storage medium, device and system based on user behaviors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |