CN111225274A - Photo music video arrangement system based on deep learning - Google Patents

Photo music video arrangement system based on deep learning Download PDF

Info

Publication number
CN111225274A
CN111225274A CN201911204406.0A CN201911204406A CN111225274A CN 111225274 A CN111225274 A CN 111225274A CN 201911204406 A CN201911204406 A CN 201911204406A CN 111225274 A CN111225274 A CN 111225274A
Authority
CN
China
Prior art keywords
photo
music
video
group
paragraph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911204406.0A
Other languages
Chinese (zh)
Other versions
CN111225274B (en
Inventor
龚俊衡
徐莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Pinguo Technology Co Ltd
Original Assignee
Chengdu Pinguo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Pinguo Technology Co Ltd filed Critical Chengdu Pinguo Technology Co Ltd
Priority to CN201911204406.0A priority Critical patent/CN111225274B/en
Publication of CN111225274A publication Critical patent/CN111225274A/en
Application granted granted Critical
Publication of CN111225274B publication Critical patent/CN111225274B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440245Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Abstract

The invention discloses a photo music video arrangement system based on deep learning, which comprises the following steps: s1, inputting a photo group and a video group, wherein the number of the photo group and the video group is arbitrary, and music is needed; s2, based on the information such as music rhythm, the music is segmented into paragraphs with different lengths based on the information such as rhythm; s3, extracting key frames of each video in the video group in a manual or automatic mode; s4, extracting the depth feature of the picture group P by using a convolutional neural network or other depth/non-depth machine learning algorithms, and simultaneously calculating the key frame of the video in the video group; and S5, selecting any photo as a starting photo, and calculating the arrangement of the photo and the video in the music paragraph by using a recurrent neural network or other depth/non-depth machine learning algorithms. The invention automatically performs the cut-off of music, analyzes the key contents of photos and videos, and fuses the photo videos based on the music cut-off to achieve the purpose of rapidly and intelligently making the music photo videos.

Description

Photo music video arrangement system based on deep learning
Technical Field
The invention relates to the technical field of information processing, in particular to a photo and music video arrangement system based on deep learning.
Background
In the age of machine learning application becoming more mature, video production is still a relatively complex process, and requires certain arranging and related resource collecting and arranging capabilities for related personnel, especially trying to merge photos into video, and recomposing photos, animation, special effects and the like will greatly increase the production complexity.
Disclosure of Invention
The invention aims to provide a photo music video arrangement system based on deep learning, which can rapidly and intelligently make music photo videos.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
the invention discloses a photo music video arrangement system based on deep learning, S1, data preparation, comprising:
picture group P ═ P0,p1...pnVideo set S ═ S0,s1...smMusic;
s2, segmenting the music into music paragraphs Q ═ Q0,q1...qk};
S3, extracting P' ═ cnn _ deep _ feature (P) from the group of pictures using the deep neural network;
s4, extracting S' ═ cnn _ deep _ feature (S) from the video group using the deep neural network;
s5, randomly selecting a photo to be placed in any music paragraph as a starting photo
Figure BDA0002296630590000021
S6, setting the material distribution set of the music paragraph as
Figure BDA0002296630590000022
Is provided with
Figure BDA0002296630590000023
In
Figure BDA0002296630590000024
Is located at a paragraph position of
Figure BDA0002296630590000025
S7, position from paragraph
Figure BDA0002296630590000026
Start of calculation
Figure BDA0002296630590000027
All the remaining positions in (1), where the remaining positions are denoted as q, q ∈ {0.. k } and
Figure BDA0002296630590000028
calculating the best photo or the best video which should be put into the rest position;
s8, when
Figure BDA0002296630590000029
After all the positions in the map are determined to be the distributed material,
Figure BDA00022966305900000210
the final result is obtained.
Preferably, in step S2, the segmentation of music is based on music tempo, using manual or automated tools.
Preferably, the best picture is confirmed using the following function,
Figure BDA00022966305900000211
wherein q is a specific paragraph and p is any one of the photo collections not present in the photo collection
Figure BDA00022966305900000212
The photograph appearing in (1).
Preferably, the best video is identified using the following function,
Figure BDA00022966305900000213
wherein q is a specific paragraph and p is any one of the photo collections not present in the photo collection
Figure BDA00022966305900000214
S is any picture in the video set that is not present in
Figure BDA00022966305900000215
The video that appears in (1).
Preferably, the best picture or the best video is determined as the best material, and f (p, s) is max (p, s).
The invention has the beneficial effects that:
and performing point cutting on the music by adopting a manual or automatic tool, analyzing key contents of the photos and the videos by using a deep neural network, and fusing the photos and the videos based on the music point cutting by using a circulating neural network so as to achieve the aim of rapidly and intelligently manufacturing the music photo videos.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the present invention comprises the steps of:
s1, preparing data, including:
picture group P ═ P0,p1...pnVideo set S ═ S0,s1...smMusic;
s2, segmenting the music into music paragraphs Q ═ Q0,q1...qkDividing music based on music rhythm by using manual or automatic tool;
s3, extracting P' ═ cnn _ deep _ feature (P) from the group of pictures using the deep neural network;
s4, extracting S' ═ cnn _ deep _ feature (S) from the video group using the deep neural network;
s5, randomly selecting a photo to be placed in any music paragraph as a starting photo
Figure BDA0002296630590000031
S6, setting the material distribution set of the music paragraph as
Figure BDA0002296630590000032
Is provided with
Figure BDA0002296630590000033
In
Figure BDA0002296630590000034
Is located at a paragraph position of
Figure BDA0002296630590000035
S7, position from paragraph
Figure BDA0002296630590000036
Start of calculation
Figure BDA0002296630590000037
All the remaining positions in (1), where the remaining positions are denoted as q, q ∈ {0.. k } and
Figure BDA0002296630590000038
calculate the best picture or best video that should be put in the remaining position,
s7.1, confirm the best picture using the following function,
Figure BDA0002296630590000039
wherein q is a specific paragraph and p is any one of the photo collections not present in the photo collection
Figure BDA00022966305900000310
The picture that appears in (a) is,
s7.2, identify the best video using the following function,
Figure BDA00022966305900000311
wherein q is a specific paragraph and p is any one of the photo collections not present in the photo collection
Figure BDA00022966305900000312
S is any picture in the video set that is not present in
Figure BDA0002296630590000041
The video that is present in (1) is,
s7.3 identifies the best picture or video as the best material, where f (p, S) is max (p, S);
s8, when
Figure BDA0002296630590000042
After all the positions in the map are determined to be the distributed material,
Figure BDA0002296630590000043
the final result is obtained.
In the process of the actual use of the rubber,
suppose we have 3 photographs, P1, P2, P3,
suppose we have 7 videos, S1, S2, S3, S4, S5, S6, S7,
assuming that we have music with segmented paragraphs Q1, Q2, Q3, Q4, Q5, we need to fill in 5 paragraphs with photos or videos respectively,
through steps S1-S4, the data set by us are all features, and can be directly input by a machine learning model, wherein cnn _ deep _ feature can be any existing convolution/non-convolution image neural network including but not limited to various existing models of open source and closed source, and the features can be any layer of output after the convolution layer (specific layer needs to be manually selected)
RNN the model is a sequence model that can be any existing sequence model including but not limited to RNN, LSTM, GRU, etc. by comparing the context in the sequence, the best match unknown can be calculated,
after randomly selecting a picture, the paragraph status may be as follows, via step S5:
Q1 Q2 Q3 Q4 Q5
is not distributed Is not distributed Q2 Is not distributed Is not distributed
Wherein Q2 is a randomly assigned initial photograph
In the calculation flow of S7, we need to calculate the best paragraph materials one by one for Q1, Q2, Q4, and Q5 until all Q' S are filled.
The present invention is capable of other embodiments, and various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention.

Claims (5)

1. A photo music video arrangement system based on deep learning is characterized by comprising the following steps:
s1, preparing data, including:
picture group P ═ P0,p1...pnVideo set S ═ S0,s1…smMusic;
s2, segmenting the music into music paragraphs Q ═ Q0,q1...qk};
S3, extracting P' ═ cnn _ deep _ feature (P) from the group of pictures using the deep neural network;
s4, extracting S' ═ cnn _ deep _ feature (S) from the video group using the deep neural network;
s5, randomly selecting a photo to be placed in any music paragraph as a starting photo
Figure FDA0002296630580000011
S6, setting the material distribution set of the music paragraph as
Figure FDA0002296630580000012
Is provided with
Figure FDA0002296630580000013
In
Figure FDA0002296630580000014
Paragraph (2)Is positioned as
Figure FDA0002296630580000015
S7, position from paragraph
Figure FDA0002296630580000016
Start of calculation
Figure FDA0002296630580000017
All the remaining positions in (1), where the remaining positions are denoted as q, q ∈ {0.. k } and
Figure FDA0002296630580000018
calculating the best photo or the best video which should be put into the rest position;
s8, when
Figure FDA0002296630580000019
After all the positions in the map are determined to be the distributed material,
Figure FDA00022966305800000110
the final result is obtained.
2. The deep learning based photo music video layout system of claim 1, wherein: in step S2, the segmentation of music is based on music tempo, using manual or automated tools.
3. The deep learning based photo music video layout system of claim 1, wherein: in step S7, the best photograph is confirmed using the following function,
Figure FDA00022966305800000111
wherein q is a specific paragraph and p is any one of the photo collections not present in the photo collection
Figure FDA00022966305800000112
The photograph appearing in (1).
4. The deep learning based photo music video layout system of claim 3, wherein: in step S7, the best video is identified using the following function,
Figure FDA0002296630580000021
wherein q is a specific paragraph and p is any one of the photo collections not present in the photo collection
Figure FDA0002296630580000022
S is any picture in the video set that is not present in
Figure FDA0002296630580000023
The video that appears in (1).
5. The deep learning based photo music video layout system of claim 4, wherein: in step S7, the best photo or the best video is determined as the best material,
f(p,s)=max(p,s)。
CN201911204406.0A 2019-11-29 2019-11-29 Photo music video arrangement system based on deep learning Active CN111225274B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911204406.0A CN111225274B (en) 2019-11-29 2019-11-29 Photo music video arrangement system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911204406.0A CN111225274B (en) 2019-11-29 2019-11-29 Photo music video arrangement system based on deep learning

Publications (2)

Publication Number Publication Date
CN111225274A true CN111225274A (en) 2020-06-02
CN111225274B CN111225274B (en) 2021-12-07

Family

ID=70829052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911204406.0A Active CN111225274B (en) 2019-11-29 2019-11-29 Photo music video arrangement system based on deep learning

Country Status (1)

Country Link
CN (1) CN111225274B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115361594A (en) * 2022-07-15 2022-11-18 北京达佳互联信息技术有限公司 Method and device for generating click video, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080193101A1 (en) * 2005-03-31 2008-08-14 Koninklijke Philips Electronics, N.V. Synthesis of Composite News Stories
CN108419035A (en) * 2018-02-28 2018-08-17 北京小米移动软件有限公司 The synthetic method and device of picture video
CN109257545A (en) * 2018-08-27 2019-01-22 咪咕文化科技有限公司 A kind of multisource video clipping method, device and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080193101A1 (en) * 2005-03-31 2008-08-14 Koninklijke Philips Electronics, N.V. Synthesis of Composite News Stories
CN108419035A (en) * 2018-02-28 2018-08-17 北京小米移动软件有限公司 The synthetic method and device of picture video
CN109257545A (en) * 2018-08-27 2019-01-22 咪咕文化科技有限公司 A kind of multisource video clipping method, device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115361594A (en) * 2022-07-15 2022-11-18 北京达佳互联信息技术有限公司 Method and device for generating click video, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111225274B (en) 2021-12-07

Similar Documents

Publication Publication Date Title
JP6395158B2 (en) How to semantically label acquired images of a scene
CN108038905B (en) A kind of Object reconstruction method based on super-pixel
CN110458957B (en) Image three-dimensional model construction method and device based on neural network
US11270476B2 (en) Method and system for providing photorealistic changes for digital image
Sýkora et al. Adding depth to cartoons using sparse depth (in) equalities
CN110059697A (en) A kind of Lung neoplasm automatic division method based on deep learning
CN104715451B (en) A kind of image seamless fusion method unanimously optimized based on color and transparency
JP2016045943A5 (en)
CN107609193A (en) The intelligent automatic processing method and system of picture in a kind of suitable commodity details page
CN106960457B (en) Color painting creation method based on image semantic extraction and doodling
CN107506362B (en) Image classification brain-imitation storage method based on user group optimization
CN103500220B (en) Method for recognizing persons in pictures
CN102156888A (en) Image sorting method based on local colors and distribution characteristics of characteristic points
CN107392244B (en) Image aesthetic feeling enhancement method based on deep neural network and cascade regression
CN109242775B (en) Attribute information migration method, device, equipment and readable storage medium
WO2021031677A1 (en) Method and device for automatically generating banner images of target object in batches
EP3474185B1 (en) Classification of 2d images according to types of 3d arrangement
WO2020258314A1 (en) Cutting method, apparatus and system for point cloud model
US20220374785A1 (en) Machine Learning System
CN104008177B (en) Rule base structure optimization and generation method and system towards linguistic indexing of pictures
JP2023548654A (en) Computer architecture for generating footwear digital assets
Jiang et al. Consensus style centralizing auto-encoder for weak style classification
CN111225274B (en) Photo music video arrangement system based on deep learning
WO2016172889A1 (en) Image segmentation method and device
CN111178083A (en) Semantic matching method and device for BIM and GIS

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant