CN109005451A - Video demolition method based on deep learning - Google Patents

Video demolition method based on deep learning Download PDF

Info

Publication number
CN109005451A
CN109005451A CN201810701351.3A CN201810701351A CN109005451A CN 109005451 A CN109005451 A CN 109005451A CN 201810701351 A CN201810701351 A CN 201810701351A CN 109005451 A CN109005451 A CN 109005451A
Authority
CN
China
Prior art keywords
demolition
face
deep learning
video
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810701351.3A
Other languages
Chinese (zh)
Other versions
CN109005451B (en
Inventor
倪攀
姜子琛
彭梅
刘睿
刘宜飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Star Technology Co Ltd
Original Assignee
Hangzhou Star Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Star Technology Co Ltd filed Critical Hangzhou Star Technology Co Ltd
Priority to CN201810701351.3A priority Critical patent/CN109005451B/en
Publication of CN109005451A publication Critical patent/CN109005451A/en
Application granted granted Critical
Publication of CN109005451B publication Critical patent/CN109005451B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The video demolition method based on deep learning that the invention discloses a kind of, comprising the following steps: step 1: video data initialization;Step 2: carrying out Face datection using face recognition technology, the time slice of similar face continuously occurred as candidate demolition segment;Step 3: in candidate demolition segment, extracting sound characteristic;Step 4: refining the demolition time point of candidate demolition segment using voice recognition technology and the sound characteristic, obtain final demolition time point.Two features of face and sound are identified using deep learning algorithm in the present invention, improve the accuracy of demolition, and can be exceedingly fast simultaneously to multiple video clips progress face and voice recognition, speed.In addition, deep learning algorithm can carry out intelligent demolition to video, reduce the investment of manpower.

Description

Video demolition method based on deep learning
Technical field
The present invention relates to media asset management technical fields, more specifically, being related to a kind of based on deep learning
Video demolition method.
Background technique
As TV programme produce full-range digitlization, networking, informationization and the continuous development of TV programme, product Tired out a large amount of multi-medium data, in face of magnanimity multimedia resource can not deep development and utilization and China to TV programme Regulatory requirements constantly promoted, demolition technology is come into being.And it interconnects The continuous development of net, so that explosive growth, live streaming, small video, network TV program, mobile multimedia is presented in video material amount Deng the instead of complete program broadcast of progress, need to split or simplify small video, user needs the fragmentation of internet content It asks and is continuously increased, demolition is also more and more widely used in new media.
Traditional demolition method is the i.e. artificial preview craft demolition frame by frame of artificial demolition, needs a large amount of human input and efficiency It is too low.The prior art is that the demolition method and traditional demolition mode specific efficiency based on cloud framework increase, in content output Timeliness and software cost in terms of have a biggish advantage, but need a large amount of human input, there is no by manpower from a large amount of low It is freed in the duplication of labour of quality.
Summary of the invention
In view of this, the present invention provides a kind of video based on deep learning that can reduce human input in demolition work Demolition method needs a large amount of human input for solving the problems, such as in the prior art.
The video demolition method based on deep learning that the present invention provides a kind of, comprising the following steps:
Step 1: video data initialization;
Step 2: carrying out Face datection using face recognition technology, the time slice of similar face continuously occurred as candidate Demolition segment;
Step 3: in candidate demolition segment, extracting sound characteristic;
Step 4: refining the demolition time point of candidate demolition segment using voice recognition technology and the sound characteristic, obtain final Demolition time point.
Optionally, video data initialization includes the audio waveform data and figure obtained in video data in the step 1 As data.
Optionally, the face recognition technology in the step 2 includes: to be encoded using deep learning algorithm to face, Compare the similitude of each picture frame face in video data.
Optionally, voice recognition technology includes: using deep learning algorithm in candidate demolition segment in the step 4 The sound that there are similar features with the extraction sound characteristic is found before and after demolition time point in a certain range.
Optionally, described to include: to face progress cataloged procedure using deep learning algorithm
Training deep neural network model, can be to the face extraction feature of input;
For the image data of inputting video data to the deep neural network model, the high-dimensional face for extracting image data is special Sign;
It is encoded, i.e., high-dimensional face characteristic is mapped as to the vector of low dimensional;
According to the vector of low dimensional, distinguish that the face in video data is similar or different.
In the present invention compared with prior art, have the advantage that in the present invention using deep learning algorithm to face and Two features of sound are identified, improve the accuracy of demolition, and can carry out face and sound to multiple video clips simultaneously Identification, speed are exceedingly fast.In addition, deep learning algorithm can carry out intelligent demolition to video, reduce the investment of manpower.
Detailed description of the invention
Fig. 1 is that the present invention is based on the flow charts of the video demolition method of deep learning.
Specific embodiment
The preferred embodiment of the present invention is described in detail below in conjunction with attached drawing, but the present invention is not restricted to these Embodiment.The present invention covers any substitution made in the spirit and scope of the present invention, modification, equivalent method and scheme.
In order to make the public have thorough understanding to the present invention, it is described in detail in the following preferred embodiment of the present invention specific Details, and the present invention can also be understood completely in description without these details for a person skilled in the art.
The present invention is more specifically described by way of example referring to attached drawing in the following passage.It should be noted that attached drawing is adopted With more simplified form and using non-accurate ratio, only to facilitate, lucidly aid in illustrating the embodiment of the present invention Purpose.
The video demolition method based on deep learning that the present invention provides a kind of, as shown in Fig. 1, comprising the following steps:
Step 1: video data initialization;
Step 2: carrying out Face datection using face recognition technology, the time slice of similar face continuously occurred as candidate Demolition segment;
Step 3: in candidate demolition segment, extracting sound characteristic;
Step 4: refining the demolition time point of candidate demolition segment using voice recognition technology and the sound characteristic, obtain final Demolition time point.
Video data initialization includes the audio waveform data and image data obtained in video data in the step 1.
Face recognition technology in the step 2 includes: to be encoded using deep learning algorithm to face, compares view The similitude of frequency each picture frame face in, is considered as a demolition piece for the continuous time segment for similar face occur Section, therefore available multiple demolition segments.
Voice recognition technology includes: the demolition time using deep learning algorithm in candidate demolition segment in the step 4 The sound that there are similar features with the extraction sound characteristic is found in a certain range of point front and back.
It is described to include: to face progress cataloged procedure using deep learning algorithm
Training deep neural network model, can be to the face extraction feature of input;
For the image data of inputting video data to the deep neural network model, the high-dimensional face for extracting image data is special Sign;
It is encoded, i.e., high-dimensional face characteristic is mapped as to the vector of low dimensional;
According to the vector of low dimensional, distinguish that the face in video data is similar or different.
By the way that multiple human face image informations are mapped to low dimensional vector, model can identify two faces be it is similar or It is identical.
In the actual process, video can be analyzed and is handled first with distributed algorithm, by video to specify the second Number (such as 10 seconds) is granularity, is divided into several segments.These segments are then distributed into available server while carrying out face With the detection of sound, speed is exceedingly fast, and the second short video production of grade may be implemented.
Embodiments described above does not constitute the restriction to the technical solution protection scope.It is any in above-mentioned implementation Made modifications, equivalent substitutions and improvements etc., should be included in the protection model of the technical solution within the spirit and principle of mode Within enclosing.

Claims (5)

1. a kind of video demolition method based on deep learning, which comprises the following steps:
Step 1: video data initialization;
Step 2: carrying out Face datection using face recognition technology, the time of similar face continuously occurred
Segment is as candidate demolition segment;
Step 3: in candidate demolition segment, extracting sound characteristic;
Step 4: the demolition time of candidate demolition segment is refined using voice recognition technology and the sound characteristic
Point obtains final demolition time point.
2. the video demolition method based on deep learning according to claim 1, it is characterised in that:
Video data initialization includes the audio waveform data and picture number obtained in video data in the step 1
According to.
3. the video demolition method based on deep learning according to claim 1, which is characterized in that
Face recognition technology in the step 2 includes: to be encoded using deep learning algorithm to face, than
Compared with the similitude of picture frame face each in video data.
4. the video demolition method based on deep learning according to claim 1, which is characterized in that
Voice recognition technology includes: the demolition using deep learning algorithm in candidate demolition segment in the step 4
The sound that there are similar features with the extraction sound characteristic is found before and after time point in a certain range.
5. the video demolition method according to claim 3 based on deep learning, which is characterized in that
It is described to include: to face progress cataloged procedure using deep learning algorithm
Training deep neural network model, can be to the face extraction feature of input;
The image data of inputting video data extracts the height of image data to the deep neural network model
Dimension face characteristic;
It is encoded, i.e., high-dimensional face characteristic is mapped as to the vector of low dimensional;
According to the vector of low dimensional, distinguish that the face in video data is similar or different.
CN201810701351.3A 2018-06-29 2018-06-29 Video strip splitting method based on deep learning Active CN109005451B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810701351.3A CN109005451B (en) 2018-06-29 2018-06-29 Video strip splitting method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810701351.3A CN109005451B (en) 2018-06-29 2018-06-29 Video strip splitting method based on deep learning

Publications (2)

Publication Number Publication Date
CN109005451A true CN109005451A (en) 2018-12-14
CN109005451B CN109005451B (en) 2021-07-30

Family

ID=64601854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810701351.3A Active CN109005451B (en) 2018-06-29 2018-06-29 Video strip splitting method based on deep learning

Country Status (1)

Country Link
CN (1) CN109005451B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110267061A (en) * 2019-04-30 2019-09-20 新华智云科技有限公司 A kind of news demolition method and system
CN111222499A (en) * 2020-04-22 2020-06-02 成都索贝数码科技股份有限公司 News automatic bar-splitting conditional random field algorithm prediction result back-flow training method
CN111586494A (en) * 2020-04-30 2020-08-25 杭州慧川智能科技有限公司 Intelligent strip splitting method based on audio and video separation
CN112565885A (en) * 2020-11-30 2021-03-26 清华珠三角研究院 Video segmentation method, system, device and storage medium
CN113810782A (en) * 2020-06-12 2021-12-17 阿里巴巴集团控股有限公司 Video processing method and device, server and electronic device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070091203A1 (en) * 2005-10-25 2007-04-26 Peker Kadir A Method and system for segmenting videos using face detection
CN101616264A (en) * 2008-06-27 2009-12-30 中国科学院自动化研究所 News video categorization and system
WO2013097101A1 (en) * 2011-12-28 2013-07-04 华为技术有限公司 Method and device for analysing video file
CN103546667A (en) * 2013-10-24 2014-01-29 中国科学院自动化研究所 Automatic news splitting method for volume broadcast television supervision
CN105931633A (en) * 2016-05-30 2016-09-07 深圳市鼎盛智能科技有限公司 Speech recognition method and system
CN106228142A (en) * 2016-07-29 2016-12-14 西安电子科技大学 Face verification method based on convolutional neural networks and Bayesian decision

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070091203A1 (en) * 2005-10-25 2007-04-26 Peker Kadir A Method and system for segmenting videos using face detection
CN101616264A (en) * 2008-06-27 2009-12-30 中国科学院自动化研究所 News video categorization and system
WO2013097101A1 (en) * 2011-12-28 2013-07-04 华为技术有限公司 Method and device for analysing video file
CN103546667A (en) * 2013-10-24 2014-01-29 中国科学院自动化研究所 Automatic news splitting method for volume broadcast television supervision
CN105931633A (en) * 2016-05-30 2016-09-07 深圳市鼎盛智能科技有限公司 Speech recognition method and system
CN106228142A (en) * 2016-07-29 2016-12-14 西安电子科技大学 Face verification method based on convolutional neural networks and Bayesian decision

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110267061A (en) * 2019-04-30 2019-09-20 新华智云科技有限公司 A kind of news demolition method and system
CN111222499A (en) * 2020-04-22 2020-06-02 成都索贝数码科技股份有限公司 News automatic bar-splitting conditional random field algorithm prediction result back-flow training method
CN111222499B (en) * 2020-04-22 2020-08-14 成都索贝数码科技股份有限公司 News automatic bar-splitting conditional random field algorithm prediction result back-flow training method
CN111586494A (en) * 2020-04-30 2020-08-25 杭州慧川智能科技有限公司 Intelligent strip splitting method based on audio and video separation
CN111586494B (en) * 2020-04-30 2022-03-11 腾讯科技(深圳)有限公司 Intelligent strip splitting method based on audio and video separation
CN113810782A (en) * 2020-06-12 2021-12-17 阿里巴巴集团控股有限公司 Video processing method and device, server and electronic device
CN113810782B (en) * 2020-06-12 2022-09-27 阿里巴巴集团控股有限公司 Video processing method and device, server and electronic device
CN112565885A (en) * 2020-11-30 2021-03-26 清华珠三角研究院 Video segmentation method, system, device and storage medium
CN112565885B (en) * 2020-11-30 2023-01-06 清华珠三角研究院 Video segmentation method, system, device and storage medium

Also Published As

Publication number Publication date
CN109005451B (en) 2021-07-30

Similar Documents

Publication Publication Date Title
CN109005451A (en) Video demolition method based on deep learning
US20210006864A1 (en) Method for displaying live broadcast room, apparatus, device, and storage medium
WO2019228267A1 (en) Short video synthesis method and apparatus, and device and storage medium
US20210174592A1 (en) Augmented reality method and device
CN109410911A (en) Artificial intelligence learning method based on speech recognition
CN103365936A (en) Video recommendation system and method thereof
CN105744292A (en) Video data processing method and device
CN114465737B (en) Data processing method and device, computer equipment and storage medium
Yan et al. Semantic segmentation guided pixel fusion for image retargeting
CN105898525A (en) Method of searching videos in specific video database, and video terminal thereof
CN105447147A (en) Data processing method and apparatus
CN107801061A (en) Ad data matching process, apparatus and system
CN103607635A (en) Method, device and terminal for caption identification
CN104036243A (en) Behavior recognition method based on light stream information
CN115515016B (en) Virtual live broadcast method, system and storage medium capable of realizing self-cross reply
CN105718543A (en) Sentence display method and device
CN110881115A (en) Strip splitting method and system for conference video
CN110099303A (en) A kind of media play system based on artificial intelligence
US20240062581A1 (en) Obtaining artist imagery from video content using facial recognition
CN113948105A (en) Voice-based image generation method, device, equipment and medium
CN106205610B (en) A kind of voice information identification method and equipment
CN114390368A (en) Live video data processing method and device, equipment and readable medium
CN105007524A (en) Video processing method and device
US20220375223A1 (en) Information generation method and apparatus
CN110874609B (en) User clustering method, storage medium, device and system based on user behaviors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant