CN109862393A - Method of dubbing in background music, system, equipment and the storage medium of video file - Google Patents

Method of dubbing in background music, system, equipment and the storage medium of video file Download PDF

Info

Publication number
CN109862393A
CN109862393A CN201910216297.8A CN201910216297A CN109862393A CN 109862393 A CN109862393 A CN 109862393A CN 201910216297 A CN201910216297 A CN 201910216297A CN 109862393 A CN109862393 A CN 109862393A
Authority
CN
China
Prior art keywords
video file
video
file
background music
dubbing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910216297.8A
Other languages
Chinese (zh)
Other versions
CN109862393B (en
Inventor
裴勇
郑文琛
杨强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN201910216297.8A priority Critical patent/CN109862393B/en
Publication of CN109862393A publication Critical patent/CN109862393A/en
Application granted granted Critical
Publication of CN109862393B publication Critical patent/CN109862393B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses method of dubbing in background music, system, equipment and the storage mediums of a kind of video file, this method comprises: extracting every video features of the initial video file from initial video file to be dubbed in background music, and the soundtrack audio file of the initial video file is generated in conjunction with every video features;Based on the initial video file and soundtrack audio file, test video file is generated;The user's portrait model and evaluation parameter that object is watched according to the test video file, are modified soundtrack audio file in the test video file, generate stand-by video file.Present invention reduces the overall costs that video is dubbed in background music, and combine video content features and user feedback, carry out video and dub in background music, user is enable to obtain better experience in watching video.

Description

Method of dubbing in background music, system, equipment and the storage medium of video file
Technical field
The present invention relates to video dub in background music the method for dubbing in background music of technical field more particularly to video file a kind of, system, equipment and Storage medium.
Background technique
When making the video file towards audience, video content is usually first produced, then according in video Hold the progress later period to dub in background music, ultimately forms the video played to users, this point is in advertisement video manufacturing process at present That embodies is especially apparent.In existing advertisement video manufacturing process, advertiser designer can preferentially be wanted according to trustee It asks and designs video content, then select existing audio file and dub in background music to the video progress later period, in this way, advertisement video is not only whole Body is at high cost, and does not account for the preference requirement that audience dubs in background music for video.It is existing that there is also the lifes of automatic music At algorithm, still, existing music automatic generating calculation can not combine music with video content features, and video is dubbed in background music effect Fruit is general.
Summary of the invention
The main purpose of the present invention is to provide method of dubbing in background music, system, equipment and the storage medium of a kind of video file, purports In the quality that raising creating ad video is newly dubbed in background music, cost of dubbing in background music is reduced, and combines advertisement video content characteristic and user feedback, It dubs in background music to advertisement video and optimizes adjustment, so that obtaining user when watching advertisement video preferably watches experience.
To achieve the above object, the present invention provides a kind of method of dubbing in background music of video file, the side of dubbing in background music of the video file Method the following steps are included:
Every video features of the initial video file are extracted from initial video file to be dubbed in background music, and are combined each The item video features generate the soundtrack audio file of the initial video file;
Based on the initial video file and soundtrack audio file, test video file is generated;
The user's portrait model and evaluation parameter that object is watched according to the test video file, to the test video text Soundtrack audio file is modified in part, generates stand-by video file.
Optionally, the video features include: light stream strength characteristic, chroma histogram feature, shot boundary characteristic,
The step of extracting every video features of the video file from initial video file to be dubbed in background music packet It includes:
It is straight to extract the coloration of each corresponding each light stream figure of video image and the video image in the initial video file Fang Tu;
Using the average light intensity of flow of each light stream figure as the light stream strength characteristic of the initial video file;
After the chroma histogram is normalized, the chroma histogram as the initial video file is special Sign;
The boundary shot for detecting the video image, the boundary shot of initial video file described in the boundary shot is special Sign.
Optionally, the video features further include: video feeling score feature,
Described the step of extracting every video features of the video file from initial video file to be dubbed in background music, is also Include:
The video content for reading the initial video file detects and counts mark video feeling in the video content Affection data;
The affection data is input to default sentiment analysis model, so that the default sentiment analysis model is to the feelings Sense data are predicted to obtain the emotion score of the video content;
Using the emotion score as the video feeling score feature of the initial video file.
Optionally, the step of generating the soundtrack audio file of the initial video file in conjunction with every video features packet It includes:
Every video features are input to default model of dubbing in background music, the default instruction that the preset configuration model passes through addition Practice sample and carry out learning training, the default training sample includes: audio, video data and pure audio data;
In the default model of dubbing in background music, in conjunction with every video features generate the initial video file with musical sound Frequency file.
Optionally, every video features are input to described before presetting the step of dubbing in background music model, the method Further include:
The lookback feature of the initial video file is detected, and the lookback feature is input to described preset It dubs in background music model.
Optionally, the default model of dubbing in background music is the model of dubbing in background music that audio file is generated based on series neural network,
In the default model of dubbing in background music, in conjunction with every video features generate the initial video file with musical sound The step of frequency file includes:
According to every video features of the initial video file and the lookback feature, note sequence is generated Column;
The sequence of notes is inputted into note duration sequence neural network, so that the note duration neural network is according to institute State sequence of notes and lookback feature output note duration sequence;
The sequence of notes is inputted into drum sequence neural network, so that the drum sequence neural network is according to the sound Accord with sequence output drumbeat combination;
It is combined according to the sequence of notes, note duration sequence and the drumbeat, generates matching for the initial video file Musical sound frequency file.
Optionally, be based on the initial video file and soundtrack audio file, generate test video file the step of include:
Read the play time sequence of the initial video file and the soundtrack audio file;
It is test view by the initial video file and the soundtrack audio file synthesis based on the play time sequence Frequency file.
Optionally, the user's portrait model and evaluation parameter that object is watched according to the test video file, to institute Stating the step of soundtrack audio file is modified, generates stand-by video file in test video file includes:
The release platform for detecting the test video file obtains the test video file from the release platform and receives See the user's portrait model and evaluation parameter of object;
Each user for reading same subscriber portrait model watches the evaluation ginseng of the test video file in predetermined period Number, and user behavior characteristics sequence is constructed according to the evaluation parameter;
When watching the test video file according to user behavior characteristics sequence calculating user, to the soundtrack audio The preference probability distribution data of file;
The default model of dubbing in background music that the soundtrack audio file is generated with preference probability distribution data guidance, to described Soundtrack audio file is modified in test video file, generates stand-by video file.
In addition, the scoring system of the video file is based on sequence the present invention also provides a kind of scoring system of video file Column neural network generates the soundtrack audio of video file, and the scoring system of the video file includes:
Soundtrack audio generation module, for extracting the initial video file from initial video file to be dubbed in background music Every video features, and generate in conjunction with every video features the soundtrack audio file of the initial video file;
Video generation module to be measured generates test video for being based on the initial video file and soundtrack audio file File;
Soundtrack audio correction module, for watching user's portrait model and the evaluation of object according to the test video file Parameter is modified soundtrack audio file in the test video file, generates stand-by video file.
Optionally, the scoring system of the video file further include:
Learning training module, for adding default training sample to the default model of dubbing in background music for generating the soundtrack audio file Learning training is carried out, the default training sample includes: audio, video data and pure audio data.
In addition, the equipment of dubbing in background music of the video file includes: to deposit the present invention also provides a kind of equipment of dubbing in background music of video file Reservoir, processor and the program of dubbing in background music for being stored in the video file that can be run on the memory and on the processor, institute State realized when the program of dubbing in background music of video file is executed by the processor video file as described above dub in background music method the step of.
In addition, being applied to computer the present invention also provides a kind of storage medium, video text is stored on the storage medium The program of dubbing in background music of part, the program of dubbing in background music of the video file realize dubbing in background music for video file as described above when being executed by processor The step of method.
The present invention is special by the every video for extracting the initial video file from initial video file to be dubbed in background music It levies, and generates the soundtrack audio file of the initial video file in conjunction with every video features;Based on the initial video File and soundtrack audio file generate test video file;User's portrait mould of object is watched according to the test video file Type and evaluation parameter are modified soundtrack audio file in the test video file, generate stand-by video file;As a result, In conjunction with the every video features extracted from the video content of initial video file, by by addition audio, video data and pure tone Frequency is trained according to transfer learning is carried out, and the user characteristic data for watching the advertisement video file audience by acquiring carries out Model is dubbed in guidance optimization, generates stand-by video file of the current initial video file after dubbing.Not only by automatic Dub algorithm and realize and dub in background music automatically and reduce the sky high cost that video file is dubbed in background music, and in conjunction with video content features dub in background music into One step improves the total quality dubbed in background music, moreover, the Feedback Evaluation also based on the video file audience is to soundtrack audio text Part optimizes adjustment, meets preference requirement of the user for content of dubbing in background music, improves user for the receipts of the video file See experience.
Detailed description of the invention
Fig. 1 is the structural schematic diagram for the hardware running environment that the embodiment of the present invention is related to;
Fig. 2 is the flow diagram of the method first embodiment of dubbing in background music of video file of the present invention;
Fig. 3 is the refinement step schematic diagram of step S100 in Fig. 2;
Fig. 4 is the flow diagram of the method second embodiment of dubbing in background music of video file of the present invention;
Fig. 5 is the flow diagram of the method 3rd embodiment of dubbing in background music of video file of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
As shown in Figure 1, Fig. 1 is the structural schematic diagram for the hardware running environment that the embodiment of the present invention is related to.
It should be noted that Fig. 1 can be the structural schematic diagram of the hardware running environment of the equipment of dubbing in background music of video file.This The equipment of dubbing in background music of inventive embodiments video file can be PC, the terminal devices such as portable computer.
As shown in Figure 1, the equipment of dubbing in background music of the video file may include: processor 1001, such as CPU, network interface 1004, user interface 1003, memory 1005, communication bus 1002.Wherein, communication bus 1002 for realizing these components it Between connection communication.User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), Optional user interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 optionally may include Standard wireline interface and wireless interface (such as WI-FI interface).Memory 1005 can be high speed RAM memory, be also possible to steady Fixed memory (non-volatile memory), such as magnetic disk storage.Memory 1005 optionally can also be independently of The storage device of aforementioned processor 1001.
It will be understood by those skilled in the art that the device structure of dubbing in background music of video file shown in Fig. 1 is not constituted to view The restriction of the equipment of dubbing in background music of frequency file may include perhaps combining certain components or not than illustrating more or fewer components Same component layout.
As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium Believe module, the program of dubbing in background music of Subscriber Interface Module SIM and video file.Wherein, operating system is to manage and control Sample video text The program of dub in background music device hardware and the software resource of part supports the fortune of dub in background music program and the other softwares or program of video file Row.
In the equipment of dubbing in background music of video file shown in Fig. 1, user interface 1003 is mainly used for being counted with each terminal According to communication;Network interface 1004 is mainly used for connecting background server, carries out data communication with background server;And processor 1001 can be used for calling the program of dubbing in background music of the video file stored in memory 1005, and execute following operation:
Every video features of the initial video file are extracted from initial video file to be dubbed in background music, and are combined each The item video features generate the soundtrack audio file of the initial video file;
Based on the initial video file and soundtrack audio file, test video file is generated;
The user's portrait model and evaluation parameter that object is watched according to the test video file, to the test video text Soundtrack audio file is modified in part, generates stand-by video file.
Further, processor 1001 can be also used for calling the journey of dubbing in background music of the video file stored in memory 1005 Sequence, and execute following steps:
It is straight to extract the coloration of each corresponding each light stream figure of video image and the video image in the initial video file Fang Tu;
Using the average light intensity of flow of each light stream figure as the light stream strength characteristic of the initial video file;
After the chroma histogram is normalized, the chroma histogram as the initial video file is special Sign;
The boundary shot for detecting the video image, the boundary shot of initial video file described in the boundary shot is special Sign.
Further, processor 1001 can be also used for calling the journey of dubbing in background music of the video file stored in memory 1005 Sequence, and execute following steps:
The video features further include: video feeling score feature,
Described the step of extracting every video features of the video file from initial video file to be dubbed in background music, is also Include:
The video content for reading the initial video file detects and counts mark video feeling in the video content Affection data;
The affection data is input to default sentiment analysis model, so that the default sentiment analysis model is to the feelings Sense data are predicted to obtain the emotion score of the video content;
Using the emotion score as the video feeling score feature of the initial video file.
Further, processor 1001 can be also used for calling the journey of dubbing in background music of the video file stored in memory 1005 Sequence, and execute following steps:
Every video features are input to default model of dubbing in background music, the default instruction that the preset configuration model passes through addition Practice sample and carry out learning training, the default training sample includes: audio, video data and pure audio data;
In the default model of dubbing in background music, in conjunction with every video features generate the initial video file with musical sound Frequency file.
Further, processor 1001 can be also used for calling the journey of dubbing in background music of the video file stored in memory 1005 Sequence, it is described every video features are input to it is default dub in background music model the step of before, execute following steps:
The lookback feature of the initial video file is detected, and the lookback feature is input to described preset It dubs in background music model.
Further, processor 1001 can be also used for calling the journey of dubbing in background music of the video file stored in memory 1005 Sequence, and execute following steps:
According to every video features of the initial video file and the lookback feature, note sequence is generated Column;
The sequence of notes is inputted into note duration sequence neural network, so that the note duration neural network is according to institute State sequence of notes and lookback feature output note duration sequence;
The sequence of notes is inputted into drum sequence neural network, so that the drum sequence neural network is according to the sound Accord with sequence output drumbeat combination;
It is combined according to the sequence of notes, note duration sequence and the drumbeat, generates matching for the initial video file Musical sound frequency file.
Further, processor 1001 can be also used for calling the journey of dubbing in background music of the video file stored in memory 1005 Sequence, and execute following steps:
Read the play time sequence of the initial video file and the soundtrack audio file;
It is test view by the initial video file and the soundtrack audio file synthesis based on the play time sequence Frequency file.
Further, processor 1001 can be also used for calling the journey of dubbing in background music of the video file stored in memory 1005 Sequence, and execute following steps:
The release platform for detecting the test video file obtains the test video file from the release platform and receives See the user's portrait model and evaluation parameter of object;
Each user for reading same subscriber portrait model watches the evaluation ginseng of the test video file in predetermined period Number, and user behavior characteristics sequence is constructed according to the evaluation parameter;
When watching the test video file according to user behavior characteristics sequence calculating user, to the soundtrack audio The preference probability distribution data of file;
The default model of dubbing in background music that the soundtrack audio file is generated with preference probability distribution data guidance, to described Soundtrack audio file is modified in test video file, generates stand-by video file.
Based on above-mentioned structure, each embodiment of the method for dubbing in background music of video file of the present invention is proposed.
Referring to figure 2., Fig. 2 is the flow diagram of the method first embodiment of dubbing in background music of video file of the present invention.
The embodiment of the invention provides the embodiments of the method for dubbing in background music of video file, it should be noted that although in process Logical order is shown in figure, but in some cases, it can be to be different from shown or described by sequence execution herein Step.
The method of dubbing in background music of video file of the embodiment of the present invention is applied to the equipment of dubbing in background music of video file, view of the embodiment of the present invention The equipment of dubbing in background music of frequency file can be PC, and the terminal devices such as portable computer are not particularly limited herein.
The method of dubbing in background music of the present embodiment video file includes:
Step S100, every video that the initial video file is extracted from initial video file to be dubbed in background music are special It levies, and generates the soundtrack audio file of the initial video file in conjunction with every video features.
When detect start to play initial video file when, call preset algorithm and predetermined sequence neural network model, Every video features are extracted from currently playing initial video file, and the every video features extracted are sent to base In the default model of dubbing in background music that series neural network is dubbed in background music automatically, by presetting models coupling currently every video features of dubbing in background music, According to the broadcasting timing of current initial video file, the soundtrack audio file of the initial video file is sequentially generated.
In the present embodiment, video file be specifically as follows advertiser designer will be in video according to customer demand Hold the advertisement video that production finishes;Each preset algorithm and predetermined sequence neural network model are specifically as follows Gunnar Farneback optical flow algorithm, chroma histogram algorithm, shot border detection model and visual classification training prediction model;In advance The happy model of establishing is specifically as follows the model of dubbing in background music automatically based on series neural network.
Specifically, for example, in the present embodiment, passing through the above-mentioned each preset algorithm of calling and predetermined sequence neural network mould Type, after extracting every video features in the advertisement video content being currently played, by every video features be passed to In default model of dubbing in background music based on series neural network, for every video spy of the default models coupling advertisement video of dubbing in background music Sign is dubbed in background music automatically.
Further, referring to figure 3., Fig. 3 is the refinement step schematic diagram of step S100 in Fig. 2, initial view to be dubbed in background music Every video features of frequency file include: light stream strength characteristic, chroma histogram feature, shot boundary characteristic, in step S100 In, the step of extracting every video features of the video file from initial video file to be dubbed in background music includes:
Step S101 extracts each corresponding each light stream figure of video image and the video figure in the initial video file The chroma histogram of picture.
Preset algorithm is called, analysis extracts light corresponding to the currently playing each frame video image of initial video file Chroma histogram corresponding to flow graph and each frame video image.
Specifically, for example, in the present embodiment, advertisement view is being played until terminating from starting to play Current ad video During the entire process of frequency, Gunnar Farneback optical flow algorithm and chroma histogram algorithm are called or called simultaneously respectively, It analyzes one by one and extracts the color in light stream figure and each frame video image corresponding to the video image of Current ad video Spend histogram.
In the present embodiment, Gunnar Farneback optical flow algorithm is called, each frame view of Current ad video is extracted The dense optical flow of frequency image, and form light stream figure corresponding to video image.
Step S102, the average light intensity of flow of each light stream figure is special as the light stream intensity of the initial video file Sign.
Specifically, for example, in the present embodiment, by calling Gunnar Farneback optical flow algorithm, calculating shape At each frame video image of Current ad video corresponding to light stream figure average light intensity of flow, using the average light intensity of flow as The light stream strength characteristic of Current ad video file.
Step S103, the color after the chroma histogram is normalized, as the initial video file Spend histogram feature.
Specifically, further current by what is extracted by calling chroma histogram algorithm for example, in the present embodiment The chroma histogram of each frame video image of advertisement video is normalized, and by the coloration after being normalized Chroma histogram feature of the histogram vectors as Current ad video file.
Step S104 detects the boundary shot of the video image, by initial video file described in the boundary shot Shot boundary characteristic.
Default shot border detection model is called, is detected in current initial video file, each paragraph variation of video content Situation, and using shot border detection result as the shot boundary characteristic of current initial video file.
Specifically, for example, in the present embodiment, advertisement view is being played until terminating from starting to play Current ad video During the entire process of frequency, default shot border detection model is called, the segmentation situation of change of currently playing advertisement video is detected, And using the testing result of shot border detection model as the shot boundary characteristic of currently playing advertisement video file.
Further, the video features of initial video file to be dubbed in background music further include: video feeling score feature, step In S100, the step of extracting every video features of the video file from initial video file to be dubbed in background music, further includes:
Step S105 reads the video content of the initial video file, detects and counts and identifies in the video content The affection data of video feeling.
The video content for reading and detecting currently playing initial video file, in the video content for identifying video The affection data of emotion is counted.
Specifically, for example, in the present embodiment, reading the video content of currently playing advertisement video, and to the video Video data in content is marked, with according to the label to the emotion score of the video content of Current ad video according to 1 Analyzed and counted to 10 points obtain fractional result (the higher video content for representing Current ad video of score more have passion or Happier, score is lower, and the video content for representing Current ad video is tranquiler).
The affection data is input to default sentiment analysis model by step S106, for the default sentiment analysis mould Type predicts the affection data to obtain the emotion score of the video content.
Specifically, for example, in the present embodiment, after obtaining flag data, which is input to pre- setting video Classification based training prediction model, by calling the visual classification training prediction model based on series neural network, further to current The emotion degree score of advertisement video subsequent time is predicted.
In the present embodiment, default visual classification training prediction model used is specifically as follows TSN (Temporal Segment Network) Behavior-based control identification video classification model, or can be trend prediction (Stream) visual classification Model.
Step S107, using the emotion score as the video feeling score feature of the initial video file.
Specifically, for example, in the present embodiment, in Current ad video that visual classification training prediction model is predicted The emotion score of appearance, the emotion score feature as Current ad video file.
Further, in step S100, in conjunction with every video features generate the initial video file with musical sound The step of frequency file includes:
Every video features are input to default model of dubbing in background music by step S108.
Specifically, for example, in the present embodiment, will be calculated based on Gunnar Farneback optical flow algorithm, chroma histogram The light stream intensity for the Current ad video that method, shot border detection model and visual classification training prediction model extract is special The video features such as sign, chroma histogram feature, shot boundary characteristic and emotion score feature are passed to based on series neural network Default model of dubbing in background music.
In the present embodiment, default model of dubbing in background music used is specifically as follows based on time recursive sequence neural network (LSTM Series neural network) dub in background music model automatically, generate the video in the default every video features of models coupling of dubbing in background music Before the soundtrack audio file of file, this is default dub in background music model by add default training sample to it is described it is default dub in background music model into Row learning training, default training sample includes: audio, video data and pure audio data, by addition training sample to based on sequence The model of dubbing in background music automatically of neural network carries out learning training, dubs in background music model automatically when being dubbed in background music automatically to video file, energy Enough obtain better effect.
Specifically, for example, utilizing audio, video data (MTV) and various pure audio datas two using the method for transfer learning The different sample of class carries out model training to the model of dubbing in background music automatically based on series neural network.According to the generalization of transfer learning Problem definition is trained, using the model knot of coder-decoder using the second class sample (pure audio data) in originating task The music samples of input are mapped to feature space by structure, encoder model, and decoder is again to the insertion feature in feature space (embedding feature) is decoded, and is realized the mapping from feature space to music, is passed through the instruction of encoder and decoder Practice, originating task model obtains the Model Weight from feature space to music;Goal task uses first kind sample (audio-video number According to) be trained, audio, video data is mapped to the feature space in originating task using characteristic extracting module first, then pass through source Embedding feature is mapped to music by the decoder model in task, realizes that study is synchronous with model end to end It updates.
Further, in step S108, every video features are input to before presetting the step of dubbing in background music model, this The method of dubbing in background music of invention video file further include:
The lookback feature of the initial video file is detected, and the lookback feature is input to described preset It dubs in background music model.
In the present embodiment, in order to enable the above-mentioned default model of dubbing in background music based on series neural network preferably to advertisement The soundtrack audio of video file carries out learning training, to the middle lookback feature of generation (i.e. output before 1-2 trifle, Whether a upper output is same with the output phase before 1-2 trifle, and current output is in the position of current trifle) it is detected, and By the lookback feature together with other every video features of current video file, it is input to basic sequence neural network Default model of dubbing in background music, dubs middle repetition and similar melody for the better identification learning current video file of model of dubbing in background music.
Step S109 generates the initial video text in conjunction with every video features in the default model of dubbing in background music The soundtrack audio file of part.
When the default model of dubbing in background music based on series neural network, every view of currently playing initial video file is received After frequency feature and the lookback feature of current initial video file, in conjunction with current every video features, according to current first The broadcasting timing of beginning video file sequentially generates the soundtrack audio file of the initial video file.
Specifically, for example, in this embodiment, when the default model of dubbing in background music based on series neural network, receiving and being based on Gunnar Farneback optical flow algorithm, chroma histogram algorithm, shot border detection model and visual classification training prediction Light stream strength characteristic, chroma histogram feature, shot boundary characteristic and the emotion score for the Current ad video that model extraction goes out After the lookback feature of the video features such as feature and current video file, in conjunction with the advertisement video currently playing moment Every video features and lookback feature and currently playing moment previous moment, the advertisement video generated Soundtrack audio, according to the calculating prediction of series neural network automatically generate the lower playing time of Current ad video with musical sound Frequently, it and according to the broadcasting timing of Current ad video, repeats the above operation of dubbing in background music and sequentially generates soundtrack audio file, until current Advertisement video finishes.
Step S200 is based on the initial video file and soundtrack audio file, generates test video file.
It is special according to current initial video file, and according to the every video features and lookback of the initial video file The play time sequence for levying the soundtrack audio file generated, the initial video file and soundtrack audio file are combined, with Generate the test video file that current initial video file contains audio content.
Further, step S200 includes:
Step S201 reads the play time sequence of the initial video file and the soundtrack audio file.
Specifically, for example, in the present embodiment, reading the play time sequence of currently playing advertisement video file respectively Column, and model is dubbed in background music according to light stream strength characteristic, the color of Current ad video file by default based on series neural network Spend the lookback of the video features such as histogram feature, shot boundary characteristic and emotion score feature and current video file The play time sequence for the soundtrack audio file that feature generates.
Step S202 is based on the play time sequence, and the initial video file and the soundtrack audio file are closed As test video file.
Specifically, for example, in the present embodiment, according to the play time sequence of the currently playing advertisement video file of reading Column, and the play time sequence with soundtrack audio file corresponding to the play time sequence, by current soundtrack audio file group It is bonded in Current ad video file, to generate the test video file that Current ad video file contains audio content.
Step S300 watches the user's portrait model and evaluation parameter of object according to the test video file, to described Soundtrack audio file is modified in test video file, generates stand-by video file.
On the release platform of current initial video file, the audient user of current initial video file is watched in detection, and When obtaining the portrait model of audient user from the platform and watching current test video file, to the test video file Evaluation parameter calls default recommended models, and the every user portrait model and evaluation parameter that will acquire are input to the recommended models, Preference when current video file, which is predicted, is watched to audient user, and default model of dubbing in background music is carried out according to the prediction result Optimization to instruct the default model of dubbing in background music to be modified the soundtrack audio file of generation, and ultimately generates current test video The stand-by video file of file.
The present invention is special by the every video for extracting the initial video file from initial video file to be dubbed in background music It levies, and generates the soundtrack audio file of the initial video file in conjunction with every video features;Based on the initial video File and soundtrack audio file generate test video file;User's portrait mould of object is watched according to the test video file Type and evaluation parameter are modified soundtrack audio file in the test video file, generate stand-by video file;As a result, In conjunction with the every video features extracted from the video content of initial video file, by by addition audio, video data and pure tone Frequency is trained according to transfer learning is carried out, and the user characteristic data for watching the advertisement video file audience by acquiring carries out Model is dubbed in guidance optimization, generates stand-by video file of the current initial video file after dubbing.Not only by automatic Dub algorithm and realize and dub in background music automatically and reduce the sky high cost that video file is dubbed in background music, and in conjunction with video content features dub in background music into One step improves the total quality dubbed in background music, moreover, the Feedback Evaluation also based on the video file audience is to soundtrack audio text Part optimizes adjustment, meets preference requirement of the user for content of dubbing in background music, improves user for the receipts of the video file See experience.
Further, propose that the present invention is based on the characteristic analysis method second embodiments of machine learning model.
Referring to figure 4., Fig. 4 is the flow diagram of the method second embodiment of dubbing in background music of video file of the present invention, based on upper State the method first embodiment of dubbing in background music of video file, in the present embodiment, above-mentioned steps S109, in the default model of dubbing in background music, The step of generating the soundtrack audio file of the initial video file in conjunction with every video features include:
Step S1091 is raw according to every video features of the initial video file and the lookback feature At sequence of notes.
Preset algorithm and predetermined sequence neural network model will be being called, mentioned from currently playing initial video file Every video features are taken out, and the lookback feature of the current initial video file detected is input to based on series neural network Default model of dubbing in background music in after, default dub in background music the models coupling items video features and lookback feature firstly generate It dubs in background music sequence of notes.
Specifically, for example, in the present embodiment, default based on series neural network dub in background music model receive it is current Light stream strength characteristic, chroma histogram feature, shot boundary characteristic and the emotion score feature and Current ad of advertisement video After the lookback feature of video, based on LSTM series neural network in each playing time t, the items of input time point t Video features and Lookback feature, i.e. light stream strength characteristic, chroma histogram feature, shot boundary characteristic and emotion score are special Sign and the note of time point t-1 (i.e. currently playing moment previous moment) output, and LSTM series neural network is exported every One time point was the output note probability distribution of note selection, and taking a note of maximum probability is current note.
In the present embodiment, to simplify dub in background music automatically model and effect of optimization, the range for exporting note is limited to C3-C6 Between 3 octaves, i.e. 36 notes, finally, the output of model are the probability distribution of 37 dimensions, represent+1 sky of 36 notes White position (i.e. this moment does not have note).
The sequence of notes is inputted note duration sequence neural network by step S202, for the note duration nerve Network exports note duration sequence according to the sequence of notes and the lookback feature.
In the present embodiment, every video features of default models coupling current video file of dubbing in background music and lookback are special The sequence of notes that sign generates is input to the note duration sequence neural network that current preset is dubbed in background music in model, by sound as input It accords with duration neural network and combines lookback feature corresponding to the sequence of notes and each video file playing time, output is current The note duration sequence of sequence of notes.
The sequence of notes is inputted drum sequence neural network, for the drum sequence neural network by step S203 Drumbeat combination is exported according to the sequence of notes.
In the present embodiment, every video features of default models coupling current video file of dubbing in background music and sequence of notes are made For input, it is input to the drum sequence neural network that current preset is dubbed in background music in model, by drum sequence neural network according to input Sequence of notes, in each trifle of sequence of notes, according to the drum of the sequence of notes of current trifle and the previous trifle of current trifle Point combination from selection in the drumbeat integrated mode (pattern) for having current drum sequence neural network and exports current trifle Drumbeat combination.
Step S204 combines according to the sequence of notes, note duration sequence and the drumbeat, generates the initial video The soundtrack audio file of file.
According to the play time sequence of presently described initial video file, by the default mould of dubbing in background music based on series neural network The sound of sequence of notes, each sequence of notes that every video features and lookback feature of the type based on current video file generate The drumbeat combination for according with duration sequence and each sequence of notes, synthesizes the soundtrack audio file of presently described initial video file.
The sequence of notes is inputted note duration sequence neural network by the present invention, for the note duration neural network Note duration sequence is exported according to the sequence of notes and the lookback feature;The sequence of notes is inputted into drum sequence Neural network, so that the drum sequence neural network exports drumbeat combination according to the sequence of notes;According to the note sequence Column, note duration sequence and drumbeat combination, generate the soundtrack audio file of the initial video file;As a result, with video Based on the video content of file, call every video features of mature series neural network combination video file, it is layer-by-layer, according to Sequence automatically generates the soundtrack audio file of current video file, reduces previous advertisement video and dubs in background music the overall cost of production, and And the total quality that advertisement video is dubbed in background music is improved, make soundtrack audio that there is the good result organically combined with video features, from And it is provided with the audience of advertisement video and preferably watches experience.
Further, the method 3rd embodiment of dubbing in background music of video file of the present invention is proposed.
Referring to figure 5., Fig. 5 is the flow diagram of the method 3rd embodiment of dubbing in background music of video file of the present invention, based on upper State dub in background music method first embodiment and the second embodiment of video file, in the present embodiment, step S300 is regarded according to the test Frequency file watches the user's portrait model and evaluation parameter of object, repairs to soundtrack audio file in the test video file Just, the step of generating stand-by video file include:
Step S301 detects the release platform of the test video file, the test is obtained from the release platform Video file watches the user's portrait model and evaluation parameter of object.
Specifically, for example, in the present embodiment, detecting release platform-DSP (party in request's platform) of Current ad video, The audient user of Current ad video is watched in detection on the DSP, and the user for extracting part audient user draw a portrait model with And when watching the test video file of Current ad, evaluation parameter of the audient user to the test video file.
In the present embodiment, user's portrait model includes: age, gender, region and client type etc., Shou Zhongyong Family includes: click, duration, time, drumbeat type and the style of dubbing in background music for playing video to the evaluation parameter of the test video file Deng.
Step S302, each user for reading same subscriber portrait model watch the test video file in predetermined period Evaluation parameter, and according to the evaluation parameter construct user behavior characteristics sequence.
In the present embodiment, default recommended models are called, every user is drawn a portrait into mode input to the recommended models, with right Audient user watches that preference when current test video file is predicted, and according to the prediction result to it is default dub in background music model into Row optimization to instruct the default model of dubbing in background music to be modified the soundtrack audio file of generation, and ultimately generates current test view The stand-by video file of frequency file.
Specifically, for example, in the present embodiment, default recommended models are specifically as follows session-based (to service week Based on phase) recommended models, in session-based recommended models, read have same age, gender, region or A kind of audient user of client type etc. portrait model, as within 1 to 2 week, receives within the certain predetermined time cycle It sees every evaluation parameter when Current ad video, such as clicks, plays duration, time, drumbeat type and the wind of dubbing in background music of video Lattice etc. construct current a kind of identical portrait model audient by each behavioral data according to the chronological order in 1 to 2 week The user behavior characteristics sequence of user.
Step S303, when watching the test video file according to user behavior characteristics sequence calculating user, to institute State the preference probability distribution data of soundtrack audio file.
Specifically, for example, in the present embodiment, using the user behavior characteristics sequence of building as input, being input to current In the series neural network of session-based recommended models, and the output result of the series neural network state layer is passed to To layer is connected entirely, connects layer entirely to the audient user of current a kind of same alike result data in the series neural network, work as currently watching The test video subsequent time of preceding advertisement predicts the preference probability distribution data of soundtrack audio style, and final output The preference probability distribution data.
Step S304 generates the default mould of dubbing in background music of the soundtrack audio file with preference probability distribution data guidance Type generates stand-by video file to be modified to soundtrack audio file in the test video file.
Connect the preference probability distribution number of layer output entirely according to the series neural network of current session-based recommended models It is predicted that as a result, guidance optimization is carried out to the default model of dubbing in background music for being currently based on series neural network, so as to the default mould of dubbing in background music Type is modified the soundtrack audio file of currently playing test video file, ultimately generate the test video file to Use video file.
Specifically, for example, in the present embodiment, during playing Current ad video, when based on LSTM sequence mind Drum sequence neural network in model of dubbing in background music automatically through network, the drum for selecting Current ad video to dub in background music according to sequence of notes When point integrated mode, by the drumbeat combined prediction result and session- of the current trifle of drum sequence neural network prediction The preference probability distribution data prediction result that the series neural network of based recommended models connects layer output entirely is weighted, with choosing The drumbeat combination for being more in line with the audient's user preference for watching Current ad video is selected out, and is finally more in line with and is watched according to this The drumbeat of audient's user preference of Current ad video combines, generate Current ad video be more in line with audient user with musical sound Frequency file, and then combine to form the final stand-by video file of the advertisement with initial ad video file.
The present invention detects the release platform of the test video file, and the test video is obtained from the release platform File watches the user's portrait model and evaluation parameter of object;Each user of same subscriber portrait model is read in predetermined period It watches the evaluation parameter of the test video file, and user behavior characteristics sequence is constructed according to the evaluation parameter;According to institute It states user behavior characteristics sequence and calculates user when watching the test video file, to the preference probability of the soundtrack audio file Distributed data;With the preference probability distribution data guidance default model of dubbing in background music, with to matching in the test video file Musical sound frequency file is modified, and generates stand-by video file;Feedback as a result, based on the advertisement video audience to dub in background music into Row is optimized and revised, and is met preference requirement of the user for content of dubbing in background music, is further improved user for the advertisement video Watch experience.
In addition, the embodiment of the present invention also proposes a kind of scoring system of video file, the scoring system of the video file Include:
Soundtrack audio generation module, for extracting the initial video file from initial video file to be dubbed in background music Every video features, and generate in conjunction with every video features the soundtrack audio file of the initial video file;
Video generation module to be measured generates test video for being based on the initial video file and soundtrack audio file File;
Soundtrack audio correction module, for watching user's portrait model and the evaluation of object according to the test video file Parameter is modified soundtrack audio file in the test video file, generates stand-by video file.
Preferably, the scoring system of the video file further include:
Learning training module, for adding default training sample to the default model of dubbing in background music for generating the soundtrack audio file Learning training is carried out, the default training sample includes: audio, video data and pure audio data.
Video file as described above is realized when the scoring system modules operation for the video file that the present embodiment proposes Dub in background music method the step of, details are not described herein.
In addition, the embodiment of the present invention also proposes a kind of storage medium, it is applied to computer, i.e., the described storage medium is to calculate Machine readable storage medium storing program for executing, the program of dubbing in background music of video file is stored on the medium, and the program of dubbing in background music of the video file is located Reason device execute when realize video file as described above dub in background music method the step of.
Wherein, the program of dubbing in background music of the video file run on the processor is performed realized method and can refer to The present invention is based on each embodiments of method of dubbing in background music of video file, and details are not described herein again.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or device.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, computer, clothes Business device, air conditioner or the network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (12)

1. a kind of method of dubbing in background music of video file, which is characterized in that the video file dub in background music method the following steps are included:
Every video features of the initial video file are extracted from initial video file to be dubbed in background music, and combine every institute State the soundtrack audio file that video features generate the initial video file;
Based on the initial video file and soundtrack audio file, test video file is generated;
The user's portrait model and evaluation parameter that object is watched according to the test video file, in the test video file Soundtrack audio file is modified, and generates stand-by video file.
2. the method for dubbing in background music of video file as described in claim 1, which is characterized in that the video features include: that light stream is strong Feature, chroma histogram feature, shot boundary characteristic are spent,
Described the step of extracting every video features of the video file from initial video file to be dubbed in background music includes:
Extract the coloration histogram of each video image corresponding each light stream figure and the video image in the initial video file Figure;
Using the average light intensity of flow of each light stream figure as the light stream strength characteristic of the initial video file;
Chroma histogram feature after the chroma histogram is normalized, as the initial video file;
The boundary shot for detecting the video image, by the shot boundary characteristic of initial video file described in the boundary shot.
3. the method for dubbing in background music of video file as described in claim 1, which is characterized in that the video features further include: video Emotion score feature,
Described the step of extracting every video features of the video file from initial video file to be dubbed in background music further include:
The video content for reading the initial video file detects and counts the emotion for identifying video feeling in the video content Data;
The affection data is input to default sentiment analysis model, so that the default sentiment analysis model is to the emotion number According to being predicted to obtain the emotion score of the video content;
Using the emotion score as the video feeling score feature of the initial video file.
4. the method for dubbing in background music of video file as described in any one of claims 1 to 3, which is characterized in that in conjunction with every view Frequency feature generates the step of soundtrack audio file of the initial video file and includes:
Every video features are input to default model of dubbing in background music, the default trained sample that the preset configuration model passes through addition This progress learning training, the default training sample includes: audio, video data and pure audio data;
In the default model of dubbing in background music, the soundtrack audio text of the initial video file is generated in conjunction with every video features Part.
5. the method for dubbing in background music of video file as claimed in claim 4, which is characterized in that described by every video features It is input to before presetting the step of dubbing in background music model, the method also includes:
The lookback feature of the initial video file is detected, and the lookback feature is input to described preset and is dubbed in background music Model.
6. the method for dubbing in background music of video file as claimed in claim 4, which is characterized in that the default model of dubbing in background music is based on sequence Column neural network generates the model of dubbing in background music of audio file,
In the default model of dubbing in background music, the soundtrack audio text of the initial video file is generated in conjunction with every video features The step of part includes:
According to every video features of the initial video file and the lookback feature, sequence of notes is generated;
The sequence of notes is inputted into note duration sequence neural network, so that the note duration neural network is according to the sound It accords with sequence and the lookback feature exports note duration sequence;
The sequence of notes is inputted into drum sequence neural network, so that the drum sequence neural network is according to the note sequence Column output drumbeat combination;
It is combined according to the sequence of notes, note duration sequence and the drumbeat, generate the initial video file matches musical sound Frequency file.
7. the method for dubbing in background music of video file as described in claim 1, which is characterized in that based on the initial video file and match Musical sound frequency file, generate test video file the step of include:
Read the play time sequence of the initial video file and the soundtrack audio file;
It is test video text by the initial video file and the soundtrack audio file synthesis based on the play time sequence Part.
8. the method for dubbing in background music of video file as described in claim 1, which is characterized in that described according to the test video file The user's portrait model and evaluation parameter for watching object, are modified soundtrack audio file in the test video file, raw Include: at the step of stand-by video file
The release platform for detecting the test video file obtains the test video file from the release platform and watches pair The user's portrait model and evaluation parameter of elephant;
Each user for reading same subscriber portrait model watches the evaluation parameter of the test video file in predetermined period, and User behavior characteristics sequence is constructed according to the evaluation parameter;
When watching the test video file according to user behavior characteristics sequence calculating user, to the soundtrack audio file Preference probability distribution data;
The default model of dubbing in background music that the soundtrack audio file is generated with preference probability distribution data guidance, to the test Soundtrack audio file is modified in video file, generates stand-by video file.
9. a kind of scoring system of video file, which is characterized in that the scoring system of the video file is based on sequential nerve net Network generates the soundtrack audio of video file, and the scoring system of the video file includes:
Soundtrack audio generation module, for extracting the items of the initial video file from initial video file to be dubbed in background music Video features, and generate in conjunction with every video features the soundtrack audio file of the initial video file;
Video generation module to be measured generates test video file for being based on the initial video file and soundtrack audio file;
Soundtrack audio correction module, for watching the user's portrait model and evaluation ginseng of object according to the test video file Number, is modified soundtrack audio file in the test video file, generates stand-by video file.
10. the scoring system of video file as claimed in claim 9, which is characterized in that the scoring system of the video file Further include:
Learning training module carries out the default model of dubbing in background music for generating the soundtrack audio file for adding default training sample Learning training, the default training sample includes: audio, video data and pure audio data.
11. a kind of equipment of dubbing in background music of video file, which is characterized in that the equipment of dubbing in background music of the video file includes: memory, place Reason device and the program of dubbing in background music for being stored in the video file that can be run on the memory and on the processor, the video text Dubbing in background music such as video file described in any item of the claim 1 to 8 is realized when the program of dubbing in background music of part is executed by the processor The step of method.
12. a kind of storage medium, which is characterized in that be applied to computer, be stored with matching for video file on the storage medium Happy program realizes such as view described in any item of the claim 1 to 8 when the program of dubbing in background music of the video file is executed by processor Frequency file dub in background music method the step of.
CN201910216297.8A 2019-03-20 2019-03-20 Method, system, equipment and storage medium for dubbing music of video file Active CN109862393B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910216297.8A CN109862393B (en) 2019-03-20 2019-03-20 Method, system, equipment and storage medium for dubbing music of video file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910216297.8A CN109862393B (en) 2019-03-20 2019-03-20 Method, system, equipment and storage medium for dubbing music of video file

Publications (2)

Publication Number Publication Date
CN109862393A true CN109862393A (en) 2019-06-07
CN109862393B CN109862393B (en) 2022-06-14

Family

ID=66901380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910216297.8A Active CN109862393B (en) 2019-03-20 2019-03-20 Method, system, equipment and storage medium for dubbing music of video file

Country Status (1)

Country Link
CN (1) CN109862393B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110446057A (en) * 2019-08-30 2019-11-12 北京字节跳动网络技术有限公司 Providing method, device, equipment and the readable medium of auxiliary data is broadcast live
CN110753238A (en) * 2019-10-29 2020-02-04 北京字节跳动网络技术有限公司 Video processing method, device, terminal and storage medium
CN110781835A (en) * 2019-10-28 2020-02-11 中国传媒大学 Data processing method and device, electronic equipment and storage medium
CN110933406A (en) * 2019-12-10 2020-03-27 央视国际网络无锡有限公司 Objective evaluation method for short video music matching quality
CN111259192A (en) * 2020-01-15 2020-06-09 腾讯科技(深圳)有限公司 Audio recommendation method and device
CN111737516A (en) * 2019-12-23 2020-10-02 北京沃东天骏信息技术有限公司 Interactive music generation method and device, intelligent sound box and storage medium
CN111800650A (en) * 2020-06-05 2020-10-20 腾讯科技(深圳)有限公司 Video dubbing method and device, electronic equipment and computer readable medium
CN112231499A (en) * 2019-07-15 2021-01-15 李姿慧 Intelligent video music distribution system
WO2022005442A1 (en) * 2020-07-03 2022-01-06 Назар Юрьевич ПОНОЧЕВНЫЙ System (embodiments) for harmoniously combining video files and audio files and corresponding method
CN113923517A (en) * 2021-09-30 2022-01-11 北京搜狗科技发展有限公司 Background music generation method and device and electronic equipment
CN115174959A (en) * 2022-06-21 2022-10-11 咪咕文化科技有限公司 Video 3D sound effect setting method and device
US20220366881A1 (en) * 2021-05-13 2022-11-17 Microsoft Technology Licensing, Llc Artificial intelligence models for composing audio scores
WO2023197749A1 (en) * 2022-04-15 2023-10-19 腾讯科技(深圳)有限公司 Background music insertion time point determining method and apparatus, device, and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040100487A1 (en) * 2002-11-25 2004-05-27 Yasuhiro Mori Short film generation/reproduction apparatus and method thereof
US20060122842A1 (en) * 2004-12-03 2006-06-08 Magix Ag System and method of automatically creating an emotional controlled soundtrack
CN102403011A (en) * 2010-09-14 2012-04-04 北京中星微电子有限公司 Music output method and device
US8737817B1 (en) * 2011-02-08 2014-05-27 Google Inc. Music soundtrack recommendation engine for videos
CN104182413A (en) * 2013-05-24 2014-12-03 福建星网视易信息系统有限公司 Method and system for recommending multimedia content
US20140376888A1 (en) * 2008-10-10 2014-12-25 Sony Corporation Information processing apparatus, program and information processing method
CN105261374A (en) * 2015-09-23 2016-01-20 海信集团有限公司 Cross-media emotion correlation method and system
CN107170432A (en) * 2017-03-31 2017-09-15 珠海市魅族科技有限公司 A kind of music generating method and device
CN108712574A (en) * 2018-05-31 2018-10-26 维沃移动通信有限公司 A kind of method and device playing music based on image
CN109063163A (en) * 2018-08-14 2018-12-21 腾讯科技(深圳)有限公司 A kind of method, apparatus, terminal device and medium that music is recommended
CN109492128A (en) * 2018-10-30 2019-03-19 北京字节跳动网络技术有限公司 Method and apparatus for generating model

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040100487A1 (en) * 2002-11-25 2004-05-27 Yasuhiro Mori Short film generation/reproduction apparatus and method thereof
US20060122842A1 (en) * 2004-12-03 2006-06-08 Magix Ag System and method of automatically creating an emotional controlled soundtrack
US20140376888A1 (en) * 2008-10-10 2014-12-25 Sony Corporation Information processing apparatus, program and information processing method
CN102403011A (en) * 2010-09-14 2012-04-04 北京中星微电子有限公司 Music output method and device
US8737817B1 (en) * 2011-02-08 2014-05-27 Google Inc. Music soundtrack recommendation engine for videos
CN104182413A (en) * 2013-05-24 2014-12-03 福建星网视易信息系统有限公司 Method and system for recommending multimedia content
CN105261374A (en) * 2015-09-23 2016-01-20 海信集团有限公司 Cross-media emotion correlation method and system
CN107170432A (en) * 2017-03-31 2017-09-15 珠海市魅族科技有限公司 A kind of music generating method and device
CN108712574A (en) * 2018-05-31 2018-10-26 维沃移动通信有限公司 A kind of method and device playing music based on image
CN109063163A (en) * 2018-08-14 2018-12-21 腾讯科技(深圳)有限公司 A kind of method, apparatus, terminal device and medium that music is recommended
CN109492128A (en) * 2018-10-30 2019-03-19 北京字节跳动网络技术有限公司 Method and apparatus for generating model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FANG-FEI KUO等: ""Background music recommendation for video based on multimodal latent semantic analysis"", 《 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)》 *
郄子涵等: ""视频背景音乐选配的人工神经网络模型"", 《电脑知识与技术》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231499A (en) * 2019-07-15 2021-01-15 李姿慧 Intelligent video music distribution system
CN110446057A (en) * 2019-08-30 2019-11-12 北京字节跳动网络技术有限公司 Providing method, device, equipment and the readable medium of auxiliary data is broadcast live
CN110781835A (en) * 2019-10-28 2020-02-11 中国传媒大学 Data processing method and device, electronic equipment and storage medium
CN110781835B (en) * 2019-10-28 2022-08-23 中国传媒大学 Data processing method and device, electronic equipment and storage medium
CN110753238A (en) * 2019-10-29 2020-02-04 北京字节跳动网络技术有限公司 Video processing method, device, terminal and storage medium
CN110933406B (en) * 2019-12-10 2021-05-14 央视国际网络无锡有限公司 Objective evaluation method for short video music matching quality
CN110933406A (en) * 2019-12-10 2020-03-27 央视国际网络无锡有限公司 Objective evaluation method for short video music matching quality
CN111737516A (en) * 2019-12-23 2020-10-02 北京沃东天骏信息技术有限公司 Interactive music generation method and device, intelligent sound box and storage medium
CN111259192A (en) * 2020-01-15 2020-06-09 腾讯科技(深圳)有限公司 Audio recommendation method and device
CN111259192B (en) * 2020-01-15 2023-12-01 腾讯科技(深圳)有限公司 Audio recommendation method and device
CN111800650A (en) * 2020-06-05 2020-10-20 腾讯科技(深圳)有限公司 Video dubbing method and device, electronic equipment and computer readable medium
WO2022005442A1 (en) * 2020-07-03 2022-01-06 Назар Юрьевич ПОНОЧЕВНЫЙ System (embodiments) for harmoniously combining video files and audio files and corresponding method
US20220366881A1 (en) * 2021-05-13 2022-11-17 Microsoft Technology Licensing, Llc Artificial intelligence models for composing audio scores
WO2022240525A1 (en) * 2021-05-13 2022-11-17 Microsoft Technology Licensing, Llc Artificial intelligence models for composing audio scores
CN113923517A (en) * 2021-09-30 2022-01-11 北京搜狗科技发展有限公司 Background music generation method and device and electronic equipment
WO2023197749A1 (en) * 2022-04-15 2023-10-19 腾讯科技(深圳)有限公司 Background music insertion time point determining method and apparatus, device, and storage medium
CN115174959A (en) * 2022-06-21 2022-10-11 咪咕文化科技有限公司 Video 3D sound effect setting method and device
CN115174959B (en) * 2022-06-21 2024-01-30 咪咕文化科技有限公司 Video 3D sound effect setting method and device

Also Published As

Publication number Publication date
CN109862393B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
CN109862393A (en) Method of dubbing in background music, system, equipment and the storage medium of video file
CN107918653A (en) A kind of intelligent playing method and device based on hobby feedback
US8963926B2 (en) User customized animated video and method for making the same
US8442389B2 (en) Electronic apparatus, reproduction control system, reproduction control method, and program therefor
CN110019961A (en) Method for processing video frequency and device, for the device of video processing
CN109447234A (en) A kind of model training method, synthesis are spoken the method and relevant apparatus of expression
CN107172485A (en) A kind of method and apparatus for being used to generate short-sighted frequency
US10789972B2 (en) Apparatus for generating relations between feature amounts of audio and scene types and method therefor
CN108924599A (en) Video caption display methods and device
CN109147800A (en) Answer method and device
CN108292314A (en) Information processing unit, information processing method and program
CN108241997A (en) Advertisement broadcast method, device and computer readable storage medium
CN107895016A (en) One kind plays multimedia method and apparatus
CN107872685A (en) A kind of player method of multi-medium data, device and computer installation
US11756571B2 (en) Apparatus that identifies a scene type and method for identifying a scene type
WO2019047850A1 (en) Identifier displaying method and device, request responding method and device
CN114073854A (en) Game method and system based on multimedia file
JP2019071009A (en) Content display program, content display method, and content display device
CN113538628A (en) Expression package generation method and device, electronic equipment and computer readable storage medium
CN108920585A (en) The method and device of music recommendation, computer readable storage medium
CN109429077A (en) Method for processing video frequency and device, for the device of video processing
CN114339076A (en) Video shooting method and device, electronic equipment and storage medium
CN106331525A (en) Realization method for interactive film
CN115866339A (en) Television program recommendation method and device, intelligent device and readable storage medium
JP7466087B2 (en) Estimation device, estimation method, and estimation system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant