CN111225274B

CN111225274B - Photo music video arrangement system based on deep learning

Info

Publication number: CN111225274B
Application number: CN201911204406.0A
Authority: CN
Inventors: 龚俊衡; 徐莹
Original assignee: Chengdu Pinguo Technology Co Ltd
Current assignee: Chengdu Pinguo Technology Co Ltd
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2021-12-07
Anticipated expiration: 2039-11-29
Also published as: CN111225274A

Abstract

The invention discloses a photo music video arrangement system based on deep learning, which comprises the following steps: s1, inputting a photo group and a video group, wherein the number of the photo group and the video group is arbitrary, and music is needed; s2, based on the information such as music rhythm, the music is segmented into paragraphs with different lengths based on the information such as rhythm; s3, extracting key frames of each video in the video group in a manual or automatic mode; s4, extracting the depth feature of the picture group P by using a convolutional neural network or other depth/non-depth machine learning algorithms, and simultaneously calculating the key frame of the video in the video group; and S5, selecting any photo as a starting photo, and calculating the arrangement of the photo and the video in the music paragraph by using a recurrent neural network or other depth/non-depth machine learning algorithms. The invention automatically performs the cut-off of music, analyzes the key contents of photos and videos, and fuses the photo videos based on the music cut-off to achieve the purpose of rapidly and intelligently making the music photo videos.

Description

Photo music video arrangement system based on deep learning

Technical Field

The invention relates to the technical field of information processing, in particular to a photo and music video arrangement system based on deep learning.

Background

In the age of machine learning application becoming more mature, video production is still a relatively complex process, and requires certain arranging and related resource collecting and arranging capabilities for related personnel, especially trying to merge photos into video, and recomposing photos, animation, special effects and the like will greatly increase the production complexity.

Disclosure of Invention

The invention aims to provide a photo music video arrangement system based on deep learning, which can rapidly and intelligently make music photo videos.

In order to achieve the purpose, the invention is realized by adopting the following technical scheme:

the invention discloses a photo music video arrangement system based on deep learning, S1, data preparation, comprising:

picture group P ═ P₀，p₁…p_nVideo set S ═ S₀，s₁…s_mMusic;

s2, segmenting the music into music paragraphs Q ═ Q₀，q₁…q_k}；

S3, extracting P' ═ cnn _ deep _ feature (P) from the group of pictures using the deep neural network;

s4, extracting S' ═ cnn _ deep _ feature (S) from the video group using the deep neural network;

s5, randomly selecting a photo to be placed in any music paragraph as a starting photo

S6, setting the material distribution set of the music paragraph as

Is provided with

In

Is located at a paragraph position of

S7, position from paragraph

Start of calculation

All the remaining positions in (1), wherein the remaining positions are denoted as q, q ∈ {0 … k } and

calculating the best photo or the best video which should be put into the rest position;

s8, when

After all the positions in the map are determined to be the distributed material,

the final result is obtained.

Preferably, in step S2, the segmentation of music is based on music tempo, using manual or automated tools.

Preferably, the best picture is confirmed using the following function,

wherein q is a specific paragraph and p is any one of the photo collections not present in the photo collection

The photograph appearing in (1).

Preferably, the best video is identified using the following function,

Picture in, s is videoAny in the set is not

The video that appears in (1).

Preferably, the best picture or the best video is determined as the best material, and f (p, s) is max (p, s).

The invention has the beneficial effects that:

and performing point cutting on the music by adopting a manual or automatic tool, analyzing key contents of the photos and the videos by using a deep neural network, and fusing the photos and the videos based on the music point cutting by using a circulating neural network so as to achieve the aim of rapidly and intelligently manufacturing the music photo videos.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the present invention comprises the steps of:

s1, preparing data, including:

picture group P ═ P₀，p₁…p_nVideo set S ═ S₀，s₁…s_mMusic;

s2, segmenting the music into music paragraphs Q ═ Q₀，q₁…q_kDividing music based on music rhythm by using manual or automatic tool;

wherein P' refers to the depth feature of the photo after cnn _ deep _ feature () extraction, and the specific depth feature content depends on the network structure used in cnn _ deep _ feature;

the cnn _ deep _ feature refers to a photo depth feature function calculated by a deep convolutional neural network (cnn algorithm) using a back propagation principle, such as full connectivity, and the like, which are simultaneously or independently used;

wherein S' refers to a picture depth feature containing frame information after extraction by cnn _ deep _ feature (); s ' is similar to P ' and is different from P ' in that a single picture of P is input, multi-frame video information which is changed into S is input, the extracted frame information is calculated frame by frame in a processing flow similar to P in a mode of frame extraction, frame by frame and the like, and S ' similar to P ' is formed, and the difference is that S ' data has one more dimension than P ', namely all extracted frames;

S6, setting the material distribution set of the music paragraph as

Is provided with

In

Is located at a paragraph position of

S7, position from paragraph

Start of calculation

calculate the best picture or best video that should be put in the remaining position,

s7.1, confirm the best picture using the following function,

The picture that appears in (a) is,

s7.2, identify the best video using the following function,

S is any picture in the video set that is not present in

The video that is present in (1) is,

s7.3 identifies the best picture or video as the best material, where f (p, S) is max (p, S);

s8, when

the final result is obtained.

In the process of the actual use of the rubber,

suppose we have 3 photographs, P1, P2, P3,

suppose we have 7 videos, S1, S2, S3, S4, S5, S6, S7,

assuming that we have music with segmented paragraphs Q1, Q2, Q3, Q4, Q5, we need to fill in 5 paragraphs with photos or videos respectively,

through steps S1-S4, the data set by us are all features, and can be directly input by a machine learning model, wherein cnn _ deep _ feature can be any existing convolution/non-convolution image neural network including but not limited to various existing models of open source and closed source, and the features can be any layer of output after the convolution layer (specific layer needs to be manually selected)

RNN the model is a sequence model that can be any existing sequence model including but not limited to RNN, LSTM, GRU, etc. by comparing the context in the sequence, the best match unknown can be calculated,

after randomly selecting a picture, the paragraph status may be as follows, via step S5:

Q1	Q2	Q3	Q4	Q5
					is not distributed	Is not distributed	Q2	Is not distributed	Is not distributed

Wherein Q2 is a randomly assigned initial photograph

In the calculation flow of S7, we need to calculate the best paragraph materials one by one for Q1, Q2, Q4, and Q5 until all Q' S are filled.

The present invention is capable of other embodiments, and various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention.

Claims

1. A photo music video arrangement system based on deep learning is characterized by comprising the following steps:

s1, preparing data, including:

picture group P ═ P₀，p₁…p_nVideo set S ═ S₀，s₁…s_mMusic;

s2, segmenting the music into music paragraphs Q ═ Q₀，q₁…q_k}；

wherein S' refers to a picture depth feature containing frame information after extraction by cnn _ deep _ feature ();

S6, material of music paragraphAn allocation set is set as

Is provided with

In

Is located at a paragraph position of

S7, position from paragraph

Start of calculation

All the remaining positions in (1), where the remaining positions are denoted as q, q ∈ {0.. k } and

s8, when

the final result is obtained.

2. The deep learning based photo music video layout system of claim 1, wherein: in step S2, the segmentation of music is based on music tempo, using manual or automated tools.

3. The deep learning based photo music video layout system of claim 1, wherein: in step S7, the best photograph is confirmed using the following function,

The photograph appearing in (1).

4. The deep learning based photo music video layout system of claim 3, wherein: in step S7, the best video is identified using the following function,

S is any picture in the video set that is not present in

The video that appears in (1).

5. The deep learning based photo music video layout system of claim 4, wherein: in step S7, the best photo or the best video is determined as the best material,

f(p，s)＝max(p，s)。