CN111225274A - Photo music video arrangement system based on deep learning - Google Patents
Photo music video arrangement system based on deep learning Download PDFInfo
- Publication number
- CN111225274A CN111225274A CN201911204406.0A CN201911204406A CN111225274A CN 111225274 A CN111225274 A CN 111225274A CN 201911204406 A CN201911204406 A CN 201911204406A CN 111225274 A CN111225274 A CN 111225274A
- Authority
- CN
- China
- Prior art keywords
- photo
- music
- video
- group
- paragraph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/47205—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4398—Processing of audio elementary streams involving reformatting operations of audio signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440245—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
Abstract
The invention discloses a photo music video arrangement system based on deep learning, which comprises the following steps: s1, inputting a photo group and a video group, wherein the number of the photo group and the video group is arbitrary, and music is needed; s2, based on the information such as music rhythm, the music is segmented into paragraphs with different lengths based on the information such as rhythm; s3, extracting key frames of each video in the video group in a manual or automatic mode; s4, extracting the depth feature of the picture group P by using a convolutional neural network or other depth/non-depth machine learning algorithms, and simultaneously calculating the key frame of the video in the video group; and S5, selecting any photo as a starting photo, and calculating the arrangement of the photo and the video in the music paragraph by using a recurrent neural network or other depth/non-depth machine learning algorithms. The invention automatically performs the cut-off of music, analyzes the key contents of photos and videos, and fuses the photo videos based on the music cut-off to achieve the purpose of rapidly and intelligently making the music photo videos.
Description
Technical Field
The invention relates to the technical field of information processing, in particular to a photo and music video arrangement system based on deep learning.
Background
In the age of machine learning application becoming more mature, video production is still a relatively complex process, and requires certain arranging and related resource collecting and arranging capabilities for related personnel, especially trying to merge photos into video, and recomposing photos, animation, special effects and the like will greatly increase the production complexity.
Disclosure of Invention
The invention aims to provide a photo music video arrangement system based on deep learning, which can rapidly and intelligently make music photo videos.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
the invention discloses a photo music video arrangement system based on deep learning, S1, data preparation, comprising:
picture group P ═ P0,p1...pnVideo set S ═ S0,s1...smMusic;
s2, segmenting the music into music paragraphs Q ═ Q0,q1...qk};
S3, extracting P' ═ cnn _ deep _ feature (P) from the group of pictures using the deep neural network;
s4, extracting S' ═ cnn _ deep _ feature (S) from the video group using the deep neural network;
S6, setting the material distribution set of the music paragraph asIs provided withInIs located at a paragraph position of
S7, position from paragraphStart of calculationAll the remaining positions in (1), where the remaining positions are denoted as q, q ∈ {0.. k } andcalculating the best photo or the best video which should be put into the rest position;
s8, whenAfter all the positions in the map are determined to be the distributed material,the final result is obtained.
Preferably, in step S2, the segmentation of music is based on music tempo, using manual or automated tools.
Preferably, the best picture is confirmed using the following function,
wherein q is a specific paragraph and p is any one of the photo collections not present in the photo collectionThe photograph appearing in (1).
Preferably, the best video is identified using the following function,
wherein q is a specific paragraph and p is any one of the photo collections not present in the photo collectionS is any picture in the video set that is not present inThe video that appears in (1).
Preferably, the best picture or the best video is determined as the best material, and f (p, s) is max (p, s).
The invention has the beneficial effects that:
and performing point cutting on the music by adopting a manual or automatic tool, analyzing key contents of the photos and the videos by using a deep neural network, and fusing the photos and the videos based on the music point cutting by using a circulating neural network so as to achieve the aim of rapidly and intelligently manufacturing the music photo videos.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the present invention comprises the steps of:
s1, preparing data, including:
picture group P ═ P0,p1...pnVideo set S ═ S0,s1...smMusic;
s2, segmenting the music into music paragraphs Q ═ Q0,q1...qkDividing music based on music rhythm by using manual or automatic tool;
s3, extracting P' ═ cnn _ deep _ feature (P) from the group of pictures using the deep neural network;
s4, extracting S' ═ cnn _ deep _ feature (S) from the video group using the deep neural network;
S6, setting the material distribution set of the music paragraph asIs provided withInIs located at a paragraph position of
S7, position from paragraphStart of calculationAll the remaining positions in (1), where the remaining positions are denoted as q, q ∈ {0.. k } andcalculate the best picture or best video that should be put in the remaining position,
s7.1, confirm the best picture using the following function,
wherein q is a specific paragraph and p is any one of the photo collections not present in the photo collectionThe picture that appears in (a) is,
s7.2, identify the best video using the following function,
wherein q is a specific paragraph and p is any one of the photo collections not present in the photo collectionS is any picture in the video set that is not present inThe video that is present in (1) is,
s7.3 identifies the best picture or video as the best material, where f (p, S) is max (p, S);
s8, whenAfter all the positions in the map are determined to be the distributed material,the final result is obtained.
In the process of the actual use of the rubber,
suppose we have 3 photographs, P1, P2, P3,
suppose we have 7 videos, S1, S2, S3, S4, S5, S6, S7,
assuming that we have music with segmented paragraphs Q1, Q2, Q3, Q4, Q5, we need to fill in 5 paragraphs with photos or videos respectively,
through steps S1-S4, the data set by us are all features, and can be directly input by a machine learning model, wherein cnn _ deep _ feature can be any existing convolution/non-convolution image neural network including but not limited to various existing models of open source and closed source, and the features can be any layer of output after the convolution layer (specific layer needs to be manually selected)
RNN the model is a sequence model that can be any existing sequence model including but not limited to RNN, LSTM, GRU, etc. by comparing the context in the sequence, the best match unknown can be calculated,
after randomly selecting a picture, the paragraph status may be as follows, via step S5:
Q1 | Q2 | Q3 | Q4 | Q5 |
is not distributed | Is not distributed | Q2 | Is not distributed | Is not distributed |
Wherein Q2 is a randomly assigned initial photograph
In the calculation flow of S7, we need to calculate the best paragraph materials one by one for Q1, Q2, Q4, and Q5 until all Q' S are filled.
The present invention is capable of other embodiments, and various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention.
Claims (5)
1. A photo music video arrangement system based on deep learning is characterized by comprising the following steps:
s1, preparing data, including:
picture group P ═ P0,p1...pnVideo set S ═ S0,s1…smMusic;
s2, segmenting the music into music paragraphs Q ═ Q0,q1...qk};
S3, extracting P' ═ cnn _ deep _ feature (P) from the group of pictures using the deep neural network;
s4, extracting S' ═ cnn _ deep _ feature (S) from the video group using the deep neural network;
S6, setting the material distribution set of the music paragraph asIs provided withInParagraph (2)Is positioned as
S7, position from paragraphStart of calculationAll the remaining positions in (1), where the remaining positions are denoted as q, q ∈ {0.. k } andcalculating the best photo or the best video which should be put into the rest position;
2. The deep learning based photo music video layout system of claim 1, wherein: in step S2, the segmentation of music is based on music tempo, using manual or automated tools.
4. The deep learning based photo music video layout system of claim 3, wherein: in step S7, the best video is identified using the following function,
5. The deep learning based photo music video layout system of claim 4, wherein: in step S7, the best photo or the best video is determined as the best material,
f(p,s)=max(p,s)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911204406.0A CN111225274B (en) | 2019-11-29 | 2019-11-29 | Photo music video arrangement system based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911204406.0A CN111225274B (en) | 2019-11-29 | 2019-11-29 | Photo music video arrangement system based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111225274A true CN111225274A (en) | 2020-06-02 |
CN111225274B CN111225274B (en) | 2021-12-07 |
Family
ID=70829052
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911204406.0A Active CN111225274B (en) | 2019-11-29 | 2019-11-29 | Photo music video arrangement system based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111225274B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115361594A (en) * | 2022-07-15 | 2022-11-18 | 北京达佳互联信息技术有限公司 | Method and device for generating click video, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080193101A1 (en) * | 2005-03-31 | 2008-08-14 | Koninklijke Philips Electronics, N.V. | Synthesis of Composite News Stories |
CN108419035A (en) * | 2018-02-28 | 2018-08-17 | 北京小米移动软件有限公司 | The synthetic method and device of picture video |
CN109257545A (en) * | 2018-08-27 | 2019-01-22 | 咪咕文化科技有限公司 | A kind of multisource video clipping method, device and storage medium |
-
2019
- 2019-11-29 CN CN201911204406.0A patent/CN111225274B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080193101A1 (en) * | 2005-03-31 | 2008-08-14 | Koninklijke Philips Electronics, N.V. | Synthesis of Composite News Stories |
CN108419035A (en) * | 2018-02-28 | 2018-08-17 | 北京小米移动软件有限公司 | The synthetic method and device of picture video |
CN109257545A (en) * | 2018-08-27 | 2019-01-22 | 咪咕文化科技有限公司 | A kind of multisource video clipping method, device and storage medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115361594A (en) * | 2022-07-15 | 2022-11-18 | 北京达佳互联信息技术有限公司 | Method and device for generating click video, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111225274B (en) | 2021-12-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6395158B2 (en) | How to semantically label acquired images of a scene | |
CN108038905B (en) | A kind of Object reconstruction method based on super-pixel | |
CN110458957B (en) | Image three-dimensional model construction method and device based on neural network | |
US11270476B2 (en) | Method and system for providing photorealistic changes for digital image | |
Sýkora et al. | Adding depth to cartoons using sparse depth (in) equalities | |
CN110059697A (en) | A kind of Lung neoplasm automatic division method based on deep learning | |
CN104715451B (en) | A kind of image seamless fusion method unanimously optimized based on color and transparency | |
JP2016045943A5 (en) | ||
CN107609193A (en) | The intelligent automatic processing method and system of picture in a kind of suitable commodity details page | |
CN106960457B (en) | Color painting creation method based on image semantic extraction and doodling | |
CN107506362B (en) | Image classification brain-imitation storage method based on user group optimization | |
CN103500220B (en) | Method for recognizing persons in pictures | |
CN102156888A (en) | Image sorting method based on local colors and distribution characteristics of characteristic points | |
CN107392244B (en) | Image aesthetic feeling enhancement method based on deep neural network and cascade regression | |
CN109242775B (en) | Attribute information migration method, device, equipment and readable storage medium | |
WO2021031677A1 (en) | Method and device for automatically generating banner images of target object in batches | |
EP3474185B1 (en) | Classification of 2d images according to types of 3d arrangement | |
WO2020258314A1 (en) | Cutting method, apparatus and system for point cloud model | |
US20220374785A1 (en) | Machine Learning System | |
CN104008177B (en) | Rule base structure optimization and generation method and system towards linguistic indexing of pictures | |
JP2023548654A (en) | Computer architecture for generating footwear digital assets | |
Jiang et al. | Consensus style centralizing auto-encoder for weak style classification | |
CN111225274B (en) | Photo music video arrangement system based on deep learning | |
WO2016172889A1 (en) | Image segmentation method and device | |
CN111178083A (en) | Semantic matching method and device for BIM and GIS |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |