CN111225274B - Photo music video arrangement system based on deep learning - Google Patents
Photo music video arrangement system based on deep learning Download PDFInfo
- Publication number
- CN111225274B CN111225274B CN201911204406.0A CN201911204406A CN111225274B CN 111225274 B CN111225274 B CN 111225274B CN 201911204406 A CN201911204406 A CN 201911204406A CN 111225274 B CN111225274 B CN 111225274B
- Authority
- CN
- China
- Prior art keywords
- photo
- music
- video
- deep
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013135 deep learning Methods 0.000 title claims abstract description 11
- 238000013528 artificial neural network Methods 0.000 claims abstract description 10
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 3
- 230000006870 function Effects 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 claims description 2
- 238000010801 machine learning Methods 0.000 abstract description 4
- 230000033764 rhythmic process Effects 0.000 abstract description 3
- 230000000306 recurrent effect Effects 0.000 abstract 1
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000005520 cutting process Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000000034 method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/47205—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4398—Processing of audio elementary streams involving reformatting operations of audio signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440245—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a photo music video arrangement system based on deep learning, which comprises the following steps: s1, inputting a photo group and a video group, wherein the number of the photo group and the video group is arbitrary, and music is needed; s2, based on the information such as music rhythm, the music is segmented into paragraphs with different lengths based on the information such as rhythm; s3, extracting key frames of each video in the video group in a manual or automatic mode; s4, extracting the depth feature of the picture group P by using a convolutional neural network or other depth/non-depth machine learning algorithms, and simultaneously calculating the key frame of the video in the video group; and S5, selecting any photo as a starting photo, and calculating the arrangement of the photo and the video in the music paragraph by using a recurrent neural network or other depth/non-depth machine learning algorithms. The invention automatically performs the cut-off of music, analyzes the key contents of photos and videos, and fuses the photo videos based on the music cut-off to achieve the purpose of rapidly and intelligently making the music photo videos.
Description
Technical Field
The invention relates to the technical field of information processing, in particular to a photo and music video arrangement system based on deep learning.
Background
In the age of machine learning application becoming more mature, video production is still a relatively complex process, and requires certain arranging and related resource collecting and arranging capabilities for related personnel, especially trying to merge photos into video, and recomposing photos, animation, special effects and the like will greatly increase the production complexity.
Disclosure of Invention
The invention aims to provide a photo music video arrangement system based on deep learning, which can rapidly and intelligently make music photo videos.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
the invention discloses a photo music video arrangement system based on deep learning, S1, data preparation, comprising:
picture group P ═ P0,p1…pnVideo set S ═ S0,s1…smMusic;
s2, segmenting the music into music paragraphs Q ═ Q0,q1…qk};
S3, extracting P' ═ cnn _ deep _ feature (P) from the group of pictures using the deep neural network;
s4, extracting S' ═ cnn _ deep _ feature (S) from the video group using the deep neural network;
S6, setting the material distribution set of the music paragraph asIs provided withInIs located at a paragraph position of
S7, position from paragraphStart of calculationAll the remaining positions in (1), wherein the remaining positions are denoted as q, q ∈ {0 … k } andcalculating the best photo or the best video which should be put into the rest position;
s8, whenAfter all the positions in the map are determined to be the distributed material,the final result is obtained.
Preferably, in step S2, the segmentation of music is based on music tempo, using manual or automated tools.
Preferably, the best picture is confirmed using the following function,
wherein q is a specific paragraph and p is any one of the photo collections not present in the photo collectionThe photograph appearing in (1).
Preferably, the best video is identified using the following function,
wherein q is a specific paragraph and p is any one of the photo collections not present in the photo collectionPicture in, s is videoAny in the set is notThe video that appears in (1).
Preferably, the best picture or the best video is determined as the best material, and f (p, s) is max (p, s).
The invention has the beneficial effects that:
and performing point cutting on the music by adopting a manual or automatic tool, analyzing key contents of the photos and the videos by using a deep neural network, and fusing the photos and the videos based on the music point cutting by using a circulating neural network so as to achieve the aim of rapidly and intelligently manufacturing the music photo videos.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the present invention comprises the steps of:
s1, preparing data, including:
picture group P ═ P0,p1…pnVideo set S ═ S0,s1…smMusic;
s2, segmenting the music into music paragraphs Q ═ Q0,q1…qkDividing music based on music rhythm by using manual or automatic tool;
s3, extracting P' ═ cnn _ deep _ feature (P) from the group of pictures using the deep neural network;
wherein P' refers to the depth feature of the photo after cnn _ deep _ feature () extraction, and the specific depth feature content depends on the network structure used in cnn _ deep _ feature;
the cnn _ deep _ feature refers to a photo depth feature function calculated by a deep convolutional neural network (cnn algorithm) using a back propagation principle, such as full connectivity, and the like, which are simultaneously or independently used;
s4, extracting S' ═ cnn _ deep _ feature (S) from the video group using the deep neural network;
wherein S' refers to a picture depth feature containing frame information after extraction by cnn _ deep _ feature (); s ' is similar to P ' and is different from P ' in that a single picture of P is input, multi-frame video information which is changed into S is input, the extracted frame information is calculated frame by frame in a processing flow similar to P in a mode of frame extraction, frame by frame and the like, and S ' similar to P ' is formed, and the difference is that S ' data has one more dimension than P ', namely all extracted frames;
S6, setting the material distribution set of the music paragraph asIs provided withInIs located at a paragraph position of
S7, position from paragraphStart of calculationAll the remaining positions in (1), wherein the remaining positions are denoted as q, q ∈ {0 … k } andcalculate the best picture or best video that should be put in the remaining position,
s7.1, confirm the best picture using the following function,
wherein q is a specific paragraph and p is any one of the photo collections not present in the photo collectionThe picture that appears in (a) is,
s7.2, identify the best video using the following function,
wherein q is a specific paragraph and p is any one of the photo collections not present in the photo collectionS is any picture in the video set that is not present inThe video that is present in (1) is,
s7.3 identifies the best picture or video as the best material, where f (p, S) is max (p, S);
s8, whenAfter all the positions in the map are determined to be the distributed material,the final result is obtained.
In the process of the actual use of the rubber,
suppose we have 3 photographs, P1, P2, P3,
suppose we have 7 videos, S1, S2, S3, S4, S5, S6, S7,
assuming that we have music with segmented paragraphs Q1, Q2, Q3, Q4, Q5, we need to fill in 5 paragraphs with photos or videos respectively,
through steps S1-S4, the data set by us are all features, and can be directly input by a machine learning model, wherein cnn _ deep _ feature can be any existing convolution/non-convolution image neural network including but not limited to various existing models of open source and closed source, and the features can be any layer of output after the convolution layer (specific layer needs to be manually selected)
RNN the model is a sequence model that can be any existing sequence model including but not limited to RNN, LSTM, GRU, etc. by comparing the context in the sequence, the best match unknown can be calculated,
after randomly selecting a picture, the paragraph status may be as follows, via step S5:
Q1 | Q2 | Q3 | Q4 | Q5 |
is not distributed | Is not distributed | Q2 | Is not distributed | Is not distributed |
Wherein Q2 is a randomly assigned initial photograph
In the calculation flow of S7, we need to calculate the best paragraph materials one by one for Q1, Q2, Q4, and Q5 until all Q' S are filled.
The present invention is capable of other embodiments, and various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention.
Claims (5)
1. A photo music video arrangement system based on deep learning is characterized by comprising the following steps:
s1, preparing data, including:
picture group P ═ P0,p1…pnVideo set S ═ S0,s1…smMusic;
s2, segmenting the music into music paragraphs Q ═ Q0,q1…qk};
S3, extracting P' ═ cnn _ deep _ feature (P) from the group of pictures using the deep neural network;
wherein P' refers to the depth feature of the photo after cnn _ deep _ feature () extraction, and the specific depth feature content depends on the network structure used in cnn _ deep _ feature;
the cnn _ deep _ feature refers to a photo depth feature function calculated by a deep convolutional neural network (cnn algorithm) using a back propagation principle, such as full connectivity, and the like, which are simultaneously or independently used;
s4, extracting S' ═ cnn _ deep _ feature (S) from the video group using the deep neural network;
wherein S' refers to a picture depth feature containing frame information after extraction by cnn _ deep _ feature ();
S6, material of music paragraphAn allocation set is set asIs provided withInIs located at a paragraph position of
S7, position from paragraphStart of calculationAll the remaining positions in (1), where the remaining positions are denoted as q, q ∈ {0.. k } andcalculating the best photo or the best video which should be put into the rest position;
2. The deep learning based photo music video layout system of claim 1, wherein: in step S2, the segmentation of music is based on music tempo, using manual or automated tools.
4. The deep learning based photo music video layout system of claim 3, wherein: in step S7, the best video is identified using the following function,
5. The deep learning based photo music video layout system of claim 4, wherein: in step S7, the best photo or the best video is determined as the best material,
f(p,s)=max(p,s)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911204406.0A CN111225274B (en) | 2019-11-29 | 2019-11-29 | Photo music video arrangement system based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911204406.0A CN111225274B (en) | 2019-11-29 | 2019-11-29 | Photo music video arrangement system based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111225274A CN111225274A (en) | 2020-06-02 |
CN111225274B true CN111225274B (en) | 2021-12-07 |
Family
ID=70829052
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911204406.0A Active CN111225274B (en) | 2019-11-29 | 2019-11-29 | Photo music video arrangement system based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111225274B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115361594A (en) * | 2022-07-15 | 2022-11-18 | 北京达佳互联信息技术有限公司 | Method and device for generating click video, electronic equipment and storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080193101A1 (en) * | 2005-03-31 | 2008-08-14 | Koninklijke Philips Electronics, N.V. | Synthesis of Composite News Stories |
CN108419035A (en) * | 2018-02-28 | 2018-08-17 | 北京小米移动软件有限公司 | The synthetic method and device of picture video |
CN109257545B (en) * | 2018-08-27 | 2021-04-13 | 咪咕文化科技有限公司 | Multi-source video editing method and device and storage medium |
-
2019
- 2019-11-29 CN CN201911204406.0A patent/CN111225274B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111225274A (en) | 2020-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110427990B (en) | Artistic image classification method based on convolutional neural network | |
US11270476B2 (en) | Method and system for providing photorealistic changes for digital image | |
Sýkora et al. | Adding depth to cartoons using sparse depth (in) equalities | |
JP2016045943A (en) | Method for semantically labeling scene acquisition image | |
CN110059697A (en) | A kind of Lung neoplasm automatic division method based on deep learning | |
JP2016045943A5 (en) | ||
CN106960457B (en) | Color painting creation method based on image semantic extraction and doodling | |
CN109242775B (en) | Attribute information migration method, device, equipment and readable storage medium | |
CN113706372B (en) | Automatic matting model building method and system | |
CN103578107B (en) | A kind of interactive image segmentation method | |
CN107392244B (en) | Image aesthetic feeling enhancement method based on deep neural network and cascade regression | |
WO2021068061A1 (en) | System and method for generating 3d models from specification documents | |
WO2021031677A1 (en) | Method and device for automatically generating banner images of target object in batches | |
Richardt et al. | Vectorising bitmaps into semi‐transparent gradient layers | |
CN104166988A (en) | Sparse matching information fusion-based three-dimensional picture synchronization segmentation method | |
CN111225274B (en) | Photo music video arrangement system based on deep learning | |
Jiang et al. | Consensus style centralizing auto-encoder for weak style classification | |
Fang et al. | Narrative collage of image collections by scene graph recombination | |
CN113345053B (en) | Intelligent color matching method and system | |
CN111178083A (en) | Semantic matching method and device for BIM and GIS | |
Vu et al. | Graph cut segmentation of neuronal structures from transmission electron micrographs | |
CN112541856B (en) | Medical image style migration method combining Markov field and Graham matrix characteristics | |
Berjón et al. | Fast feature matching for detailed point cloud generation | |
RU2710659C1 (en) | Simultaneous uncontrolled segmentation of objects and drawing | |
CN107491814B (en) | Construction method of process case layered knowledge model for knowledge push |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |