CN109005451A

CN109005451A - Video demolition method based on deep learning

Info

Publication number: CN109005451A
Application number: CN201810701351.3A
Authority: CN
Inventors: 倪攀; 姜子琛; 彭梅; 刘睿; 刘宜飞
Original assignee: Hangzhou Star Technology Co Ltd
Current assignee: Hangzhou Star Technology Co Ltd
Priority date: 2018-06-29
Filing date: 2018-06-29
Publication date: 2018-12-14
Anticipated expiration: 2038-06-29
Also published as: CN109005451B

Abstract

The video demolition method based on deep learning that the invention discloses a kind of, comprising the following steps: step 1: video data initialization；Step 2: carrying out Face datection using face recognition technology, the time slice of similar face continuously occurred as candidate demolition segment；Step 3: in candidate demolition segment, extracting sound characteristic；Step 4: refining the demolition time point of candidate demolition segment using voice recognition technology and the sound characteristic, obtain final demolition time point.Two features of face and sound are identified using deep learning algorithm in the present invention, improve the accuracy of demolition, and can be exceedingly fast simultaneously to multiple video clips progress face and voice recognition, speed.In addition, deep learning algorithm can carry out intelligent demolition to video, reduce the investment of manpower.

Description

Video demolition method based on deep learning

Technical field

The present invention relates to media asset management technical fields, more specifically, being related to a kind of based on deep learning

Video demolition method.

Background technique

As TV programme produce full-range digitlization, networking, informationization and the continuous development of TV programme, product Tired out a large amount of multi-medium data, in face of magnanimity multimedia resource can not deep development and utilization and China to TV programme Regulatory requirements constantly promoted, demolition technology is come into being.And it interconnects The continuous development of net, so that explosive growth, live streaming, small video, network TV program, mobile multimedia is presented in video material amount Deng the instead of complete program broadcast of progress, need to split or simplify small video, user needs the fragmentation of internet content It asks and is continuously increased, demolition is also more and more widely used in new media.

Traditional demolition method is the i.e. artificial preview craft demolition frame by frame of artificial demolition, needs a large amount of human input and efficiency It is too low.The prior art is that the demolition method and traditional demolition mode specific efficiency based on cloud framework increase, in content output Timeliness and software cost in terms of have a biggish advantage, but need a large amount of human input, there is no by manpower from a large amount of low It is freed in the duplication of labour of quality.

Summary of the invention

In view of this, the present invention provides a kind of video based on deep learning that can reduce human input in demolition work Demolition method needs a large amount of human input for solving the problems, such as in the prior art.

The video demolition method based on deep learning that the present invention provides a kind of, comprising the following steps:

Step 1: video data initialization；

Step 2: carrying out Face datection using face recognition technology, the time slice of similar face continuously occurred as candidate Demolition segment；

Step 3: in candidate demolition segment, extracting sound characteristic；

Step 4: refining the demolition time point of candidate demolition segment using voice recognition technology and the sound characteristic, obtain final Demolition time point.

Optionally, video data initialization includes the audio waveform data and figure obtained in video data in the step 1 As data.

Optionally, the face recognition technology in the step 2 includes: to be encoded using deep learning algorithm to face, Compare the similitude of each picture frame face in video data.

Optionally, voice recognition technology includes: using deep learning algorithm in candidate demolition segment in the step 4 The sound that there are similar features with the extraction sound characteristic is found before and after demolition time point in a certain range.

Optionally, described to include: to face progress cataloged procedure using deep learning algorithm

Training deep neural network model, can be to the face extraction feature of input；

For the image data of inputting video data to the deep neural network model, the high-dimensional face for extracting image data is special Sign；

It is encoded, i.e., high-dimensional face characteristic is mapped as to the vector of low dimensional；

According to the vector of low dimensional, distinguish that the face in video data is similar or different.

In the present invention compared with prior art, have the advantage that in the present invention using deep learning algorithm to face and Two features of sound are identified, improve the accuracy of demolition, and can carry out face and sound to multiple video clips simultaneously Identification, speed are exceedingly fast.In addition, deep learning algorithm can carry out intelligent demolition to video, reduce the investment of manpower.

Detailed description of the invention

Fig. 1 is that the present invention is based on the flow charts of the video demolition method of deep learning.

Specific embodiment

The preferred embodiment of the present invention is described in detail below in conjunction with attached drawing, but the present invention is not restricted to these Embodiment.The present invention covers any substitution made in the spirit and scope of the present invention, modification, equivalent method and scheme.

In order to make the public have thorough understanding to the present invention, it is described in detail in the following preferred embodiment of the present invention specific Details, and the present invention can also be understood completely in description without these details for a person skilled in the art.

The present invention is more specifically described by way of example referring to attached drawing in the following passage.It should be noted that attached drawing is adopted With more simplified form and using non-accurate ratio, only to facilitate, lucidly aid in illustrating the embodiment of the present invention Purpose.

The video demolition method based on deep learning that the present invention provides a kind of, as shown in Fig. 1, comprising the following steps:

Step 1: video data initialization；

Step 3: in candidate demolition segment, extracting sound characteristic；

Video data initialization includes the audio waveform data and image data obtained in video data in the step 1.

Face recognition technology in the step 2 includes: to be encoded using deep learning algorithm to face, compares view The similitude of frequency each picture frame face in, is considered as a demolition piece for the continuous time segment for similar face occur Section, therefore available multiple demolition segments.

Voice recognition technology includes: the demolition time using deep learning algorithm in candidate demolition segment in the step 4 The sound that there are similar features with the extraction sound characteristic is found in a certain range of point front and back.

It is described to include: to face progress cataloged procedure using deep learning algorithm

By the way that multiple human face image informations are mapped to low dimensional vector, model can identify two faces be it is similar or It is identical.

In the actual process, video can be analyzed and is handled first with distributed algorithm, by video to specify the second Number (such as 10 seconds) is granularity, is divided into several segments.These segments are then distributed into available server while carrying out face With the detection of sound, speed is exceedingly fast, and the second short video production of grade may be implemented.

Embodiments described above does not constitute the restriction to the technical solution protection scope.It is any in above-mentioned implementation Made modifications, equivalent substitutions and improvements etc., should be included in the protection model of the technical solution within the spirit and principle of mode Within enclosing.

Claims

1. a kind of video demolition method based on deep learning, which comprises the following steps:

Step 1: video data initialization；

Step 2: carrying out Face datection using face recognition technology, the time of similar face continuously occurred

Segment is as candidate demolition segment；

Step 3: in candidate demolition segment, extracting sound characteristic；

Step 4: the demolition time of candidate demolition segment is refined using voice recognition technology and the sound characteristic

Point obtains final demolition time point.

2. the video demolition method based on deep learning according to claim 1, it is characterised in that:

Video data initialization includes the audio waveform data and picture number obtained in video data in the step 1

According to.

3. the video demolition method based on deep learning according to claim 1, which is characterized in that

Face recognition technology in the step 2 includes: to be encoded using deep learning algorithm to face, than

Compared with the similitude of picture frame face each in video data.

4. the video demolition method based on deep learning according to claim 1, which is characterized in that

Voice recognition technology includes: the demolition using deep learning algorithm in candidate demolition segment in the step 4

The sound that there are similar features with the extraction sound characteristic is found before and after time point in a certain range.

5. the video demolition method according to claim 3 based on deep learning, which is characterized in that

The image data of inputting video data extracts the height of image data to the deep neural network model

Dimension face characteristic；