CN109005451B

CN109005451B - Video strip splitting method based on deep learning

Info

Publication number: CN109005451B
Application number: CN201810701351.3A
Authority: CN
Inventors: 倪攀; 姜子琛; 彭梅; 刘睿; 刘宜飞
Original assignee: Hangzhou Xingxi Technology Co ltd
Current assignee: Hangzhou Xingxi Technology Co ltd
Priority date: 2018-06-29
Filing date: 2018-06-29
Publication date: 2021-07-30
Anticipated expiration: 2038-06-29
Also published as: CN109005451A

Abstract

The invention discloses a video strip splitting method based on deep learning, which comprises the following steps: step 1: initializing video data; step 2: carrying out face detection by using a face recognition technology to obtain time segments with continuous similar faces as candidate stripping segments; and step 3: extracting sound features from the candidate strip splitting segments; and 4, step 4: and refining the strip splitting time points of the candidate strip splitting segments by utilizing a voice recognition technology and the voice characteristics to obtain final strip splitting time points. The invention utilizes the deep learning algorithm to identify the two characteristics of the face and the sound, improves the strip-splitting accuracy, can simultaneously identify the face and the sound of a plurality of video segments, and has extremely high speed. In addition, the deep learning algorithm can intelligently tear strips apart from the video, and the labor input is reduced.

Description

Video strip splitting method based on deep learning

Technical Field

The invention relates to the technical field of media asset management, in particular to a video striping method based on deep learning.

Background

With the digitalization of the whole production process of television programs, networking, informatization and continuous development of television programs, a large amount of multimedia data is accumulated, and the strip-removing technology is developed in response to the situation that massive multimedia resources cannot be deeply developed and utilized and the supervision requirements of China on the television programs are continuously improved. The continuous development of the internet causes the video material amount to show explosive growth, live broadcast, small video, network television programs, mobile multimedia and the like do not carry out complete program broadcasting but need to split or simplify the small video, the fragmentation requirement of users on the internet content is continuously increased, and the split pieces are also widely applied to new media.

The traditional strip splitting method is manual strip splitting, namely manual strip splitting through frame-by-frame preview, and a large amount of manpower input is needed and the efficiency is too low. The prior art is a strip splitting method based on a cloud framework, the efficiency is improved compared with the traditional strip splitting method, the method has great advantages in the aspects of timeliness of content output and software cost, but a large amount of manpower input is required, and manpower is not released from a large amount of low-quality repeated labor.

Disclosure of Invention

In view of this, the invention provides a video strip splitting method based on deep learning, which can reduce the human input in strip splitting work, and is used for solving the problem that a large amount of human input is required in the prior art.

The invention provides a video strip splitting method based on deep learning, which comprises the following steps:

step 1: initializing video data;

step 2: carrying out face detection by using a face recognition technology to obtain time segments with continuous similar faces as candidate stripping segments;

and step 3: extracting sound features from the candidate strip splitting segments;

and 4, step 4: and refining the strip splitting time points of the candidate strip splitting segments by utilizing a voice recognition technology and the voice characteristics to obtain final strip splitting time points.

Optionally, the initializing of the video data in step 1 includes obtaining audio waveform data and image data in the video data.

Optionally, the face recognition technology in step 2 includes: and coding the human face by using a deep learning algorithm, and comparing the similarity of the human face of each image frame in the video data.

Optionally, the voice recognition technology in step 4 includes: and searching sounds with similar characteristics to the extracted sound characteristics in a certain range before and after the strip splitting time point of the candidate strip splitting segment by using a deep learning algorithm.

Optionally, the process of encoding the face by using the deep learning algorithm includes:

training a deep neural network model to enable the deep neural network model to extract features of an input human face;

inputting image data of video data to the deep neural network model, and extracting high-dimensional face features of the image data;

coding is carried out, namely high-dimensional face features are mapped into low-dimensional vectors; and distinguishing whether the human faces in the video data are similar or different according to the low-dimensional vector.

Compared with the prior art, the invention has the following advantages: the invention utilizes the deep learning algorithm to identify the two characteristics of the face and the sound, improves the strip-splitting accuracy, can simultaneously identify the face and the sound of a plurality of video segments, and has extremely high speed. In addition, the deep learning algorithm can intelligently tear strips apart from the video, and the labor input is reduced.

Drawings

FIG. 1 is a flowchart of a video striping method based on deep learning according to the present invention.

Detailed Description

Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, but the present invention is not limited to only these embodiments. The invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention.

In the following description of the preferred embodiments of the present invention, specific details are set forth in order to provide a thorough understanding of the present invention, and it will be apparent to those skilled in the art that the present invention may be practiced without these specific details.

The invention is described in more detail in the following paragraphs by way of example with reference to the accompanying drawings. It should be noted that the drawings are in simplified form and are not to precise scale, which is only used for convenience and clarity to assist in describing the embodiments of the present invention.

The invention provides a video strip splitting method based on deep learning, which comprises the following steps as shown in figure 1:

step 1: initializing video data;

The video data initialization in step 1 includes obtaining audio waveform data and image data in the video data.

The face recognition technology in the step 2 comprises the following steps: the face is coded by using a deep learning algorithm, the similarity of the face of each image frame in video data is compared, and a continuous time segment with similar faces is regarded as a stripping segment, so that a plurality of stripping segments can be obtained.

The voice recognition technology in the step 4 comprises the following steps: and searching sounds with similar characteristics to the extracted sound characteristics in a certain range before and after the strip splitting time point of the candidate strip splitting segment by using a deep learning algorithm.

The process of encoding the human face by using the deep learning algorithm comprises the following steps: training a deep neural network model to enable the deep neural network model to extract features of an input human face;

coding is carried out, namely high-dimensional face features are mapped into low-dimensional vectors;

and distinguishing whether the human faces in the video data are similar or different according to the low-dimensional vector.

By mapping the image information of multiple faces into a low-dimensional vector, the model can distinguish that two faces are similar or identical.

In an actual process, a video may be analyzed and processed by using a distributed algorithm, and the video is divided into a plurality of segments with a specified number of seconds (e.g., 10 seconds) as a granularity. And then the segments are distributed to an available server to simultaneously detect the human face and the voice, so that the speed is extremely high, and the second-level short video production can be realized.

The above-described embodiments do not limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the above-described embodiments should be included in the protection scope of the technical solution.

Claims

1. A video strip splitting method based on deep learning is characterized by comprising the following steps:

step 1: initializing video data;

and 4, step 4: refining the strip splitting time points of the candidate strip splitting segments by utilizing a voice recognition technology and the voice characteristics to obtain final strip splitting time points;

2. The video stripping method based on deep learning of claim 1, characterized in that: the video data initialization in step 1 includes obtaining audio waveform data and image data in the video data.

3. The method for splitting video strips based on deep learning of claim 1, wherein the face recognition technology in step 2 comprises: and coding the human face by using a deep learning algorithm, and comparing the similarity of the human face of each image frame in the video data.

4. The method for splitting video strips based on deep learning of claim 3, wherein the encoding process of the human face by using the deep learning algorithm comprises: