CN109862393A

CN109862393A - Method of dubbing in background music, system, equipment and the storage medium of video file

Info

Publication number: CN109862393A
Application number: CN201910216297.8A
Authority: CN
Inventors: 裴勇; 郑文琛; 杨强
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2019-03-20
Filing date: 2019-03-20
Publication date: 2019-06-07
Anticipated expiration: 2039-03-20
Also published as: CN109862393B

Abstract

The invention discloses method of dubbing in background music, system, equipment and the storage mediums of a kind of video file, this method comprises: extracting every video features of the initial video file from initial video file to be dubbed in background music, and the soundtrack audio file of the initial video file is generated in conjunction with every video features；Based on the initial video file and soundtrack audio file, test video file is generated；The user's portrait model and evaluation parameter that object is watched according to the test video file, are modified soundtrack audio file in the test video file, generate stand-by video file.Present invention reduces the overall costs that video is dubbed in background music, and combine video content features and user feedback, carry out video and dub in background music, user is enable to obtain better experience in watching video.

Description

Method of dubbing in background music, system, equipment and the storage medium of video file

Technical field

The present invention relates to video dub in background music the method for dubbing in background music of technical field more particularly to video file a kind of, system, equipment and Storage medium.

Background technique

When making the video file towards audience, video content is usually first produced, then according in video Hold the progress later period to dub in background music, ultimately forms the video played to users, this point is in advertisement video manufacturing process at present That embodies is especially apparent.In existing advertisement video manufacturing process, advertiser designer can preferentially be wanted according to trustee It asks and designs video content, then select existing audio file and dub in background music to the video progress later period, in this way, advertisement video is not only whole Body is at high cost, and does not account for the preference requirement that audience dubs in background music for video.It is existing that there is also the lifes of automatic music At algorithm, still, existing music automatic generating calculation can not combine music with video content features, and video is dubbed in background music effect Fruit is general.

Summary of the invention

The main purpose of the present invention is to provide method of dubbing in background music, system, equipment and the storage medium of a kind of video file, purports In the quality that raising creating ad video is newly dubbed in background music, cost of dubbing in background music is reduced, and combines advertisement video content characteristic and user feedback, It dubs in background music to advertisement video and optimizes adjustment, so that obtaining user when watching advertisement video preferably watches experience.

To achieve the above object, the present invention provides a kind of method of dubbing in background music of video file, the side of dubbing in background music of the video file Method the following steps are included:

Every video features of the initial video file are extracted from initial video file to be dubbed in background music, and are combined each The item video features generate the soundtrack audio file of the initial video file；

Based on the initial video file and soundtrack audio file, test video file is generated；

The user's portrait model and evaluation parameter that object is watched according to the test video file, to the test video text Soundtrack audio file is modified in part, generates stand-by video file.

Optionally, the video features include: light stream strength characteristic, chroma histogram feature, shot boundary characteristic,

The step of extracting every video features of the video file from initial video file to be dubbed in background music packet It includes:

It is straight to extract the coloration of each corresponding each light stream figure of video image and the video image in the initial video file Fang Tu；

Using the average light intensity of flow of each light stream figure as the light stream strength characteristic of the initial video file；

After the chroma histogram is normalized, the chroma histogram as the initial video file is special Sign；

The boundary shot for detecting the video image, the boundary shot of initial video file described in the boundary shot is special Sign.

Optionally, the video features further include: video feeling score feature,

Described the step of extracting every video features of the video file from initial video file to be dubbed in background music, is also Include:

The video content for reading the initial video file detects and counts mark video feeling in the video content Affection data；

The affection data is input to default sentiment analysis model, so that the default sentiment analysis model is to the feelings Sense data are predicted to obtain the emotion score of the video content；

Using the emotion score as the video feeling score feature of the initial video file.

Optionally, the step of generating the soundtrack audio file of the initial video file in conjunction with every video features packet It includes:

Every video features are input to default model of dubbing in background music, the default instruction that the preset configuration model passes through addition Practice sample and carry out learning training, the default training sample includes: audio, video data and pure audio data；

In the default model of dubbing in background music, in conjunction with every video features generate the initial video file with musical sound Frequency file.

Optionally, every video features are input to described before presetting the step of dubbing in background music model, the method Further include:

The lookback feature of the initial video file is detected, and the lookback feature is input to described preset It dubs in background music model.

Optionally, the default model of dubbing in background music is the model of dubbing in background music that audio file is generated based on series neural network,

In the default model of dubbing in background music, in conjunction with every video features generate the initial video file with musical sound The step of frequency file includes:

According to every video features of the initial video file and the lookback feature, note sequence is generated Column；

The sequence of notes is inputted into note duration sequence neural network, so that the note duration neural network is according to institute State sequence of notes and lookback feature output note duration sequence；

The sequence of notes is inputted into drum sequence neural network, so that the drum sequence neural network is according to the sound Accord with sequence output drumbeat combination；

It is combined according to the sequence of notes, note duration sequence and the drumbeat, generates matching for the initial video file Musical sound frequency file.

Optionally, be based on the initial video file and soundtrack audio file, generate test video file the step of include:

Read the play time sequence of the initial video file and the soundtrack audio file；

It is test view by the initial video file and the soundtrack audio file synthesis based on the play time sequence Frequency file.

Optionally, the user's portrait model and evaluation parameter that object is watched according to the test video file, to institute Stating the step of soundtrack audio file is modified, generates stand-by video file in test video file includes:

The release platform for detecting the test video file obtains the test video file from the release platform and receives See the user's portrait model and evaluation parameter of object；

Each user for reading same subscriber portrait model watches the evaluation ginseng of the test video file in predetermined period Number, and user behavior characteristics sequence is constructed according to the evaluation parameter；

When watching the test video file according to user behavior characteristics sequence calculating user, to the soundtrack audio The preference probability distribution data of file；

The default model of dubbing in background music that the soundtrack audio file is generated with preference probability distribution data guidance, to described Soundtrack audio file is modified in test video file, generates stand-by video file.

In addition, the scoring system of the video file is based on sequence the present invention also provides a kind of scoring system of video file Column neural network generates the soundtrack audio of video file, and the scoring system of the video file includes:

Soundtrack audio generation module, for extracting the initial video file from initial video file to be dubbed in background music Every video features, and generate in conjunction with every video features the soundtrack audio file of the initial video file；

Video generation module to be measured generates test video for being based on the initial video file and soundtrack audio file File；

Soundtrack audio correction module, for watching user's portrait model and the evaluation of object according to the test video file Parameter is modified soundtrack audio file in the test video file, generates stand-by video file.

Optionally, the scoring system of the video file further include:

Learning training module, for adding default training sample to the default model of dubbing in background music for generating the soundtrack audio file Learning training is carried out, the default training sample includes: audio, video data and pure audio data.

In addition, the equipment of dubbing in background music of the video file includes: to deposit the present invention also provides a kind of equipment of dubbing in background music of video file Reservoir, processor and the program of dubbing in background music for being stored in the video file that can be run on the memory and on the processor, institute State realized when the program of dubbing in background music of video file is executed by the processor video file as described above dub in background music method the step of.

In addition, being applied to computer the present invention also provides a kind of storage medium, video text is stored on the storage medium The program of dubbing in background music of part, the program of dubbing in background music of the video file realize dubbing in background music for video file as described above when being executed by processor The step of method.

The present invention is special by the every video for extracting the initial video file from initial video file to be dubbed in background music It levies, and generates the soundtrack audio file of the initial video file in conjunction with every video features；Based on the initial video File and soundtrack audio file generate test video file；User's portrait mould of object is watched according to the test video file Type and evaluation parameter are modified soundtrack audio file in the test video file, generate stand-by video file；As a result, In conjunction with the every video features extracted from the video content of initial video file, by by addition audio, video data and pure tone Frequency is trained according to transfer learning is carried out, and the user characteristic data for watching the advertisement video file audience by acquiring carries out Model is dubbed in guidance optimization, generates stand-by video file of the current initial video file after dubbing.Not only by automatic Dub algorithm and realize and dub in background music automatically and reduce the sky high cost that video file is dubbed in background music, and in conjunction with video content features dub in background music into One step improves the total quality dubbed in background music, moreover, the Feedback Evaluation also based on the video file audience is to soundtrack audio text Part optimizes adjustment, meets preference requirement of the user for content of dubbing in background music, improves user for the receipts of the video file See experience.

Detailed description of the invention

Fig. 1 is the structural schematic diagram for the hardware running environment that the embodiment of the present invention is related to；

Fig. 2 is the flow diagram of the method first embodiment of dubbing in background music of video file of the present invention；

Fig. 3 is the refinement step schematic diagram of step S100 in Fig. 2；

Fig. 4 is the flow diagram of the method second embodiment of dubbing in background music of video file of the present invention；

Fig. 5 is the flow diagram of the method 3rd embodiment of dubbing in background music of video file of the present invention.

The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.

Specific embodiment

It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.

As shown in Figure 1, Fig. 1 is the structural schematic diagram for the hardware running environment that the embodiment of the present invention is related to.

It should be noted that Fig. 1 can be the structural schematic diagram of the hardware running environment of the equipment of dubbing in background music of video file.This The equipment of dubbing in background music of inventive embodiments video file can be PC, the terminal devices such as portable computer.

As shown in Figure 1, the equipment of dubbing in background music of the video file may include: processor 1001, such as CPU, network interface 1004, user interface 1003, memory 1005, communication bus 1002.Wherein, communication bus 1002 for realizing these components it Between connection communication.User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), Optional user interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 optionally may include Standard wireline interface and wireless interface (such as WI-FI interface).Memory 1005 can be high speed RAM memory, be also possible to steady Fixed memory (non-volatile memory), such as magnetic disk storage.Memory 1005 optionally can also be independently of The storage device of aforementioned processor 1001.

It will be understood by those skilled in the art that the device structure of dubbing in background music of video file shown in Fig. 1 is not constituted to view The restriction of the equipment of dubbing in background music of frequency file may include perhaps combining certain components or not than illustrating more or fewer components Same component layout.

As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium Believe module, the program of dubbing in background music of Subscriber Interface Module SIM and video file.Wherein, operating system is to manage and control Sample video text The program of dub in background music device hardware and the software resource of part supports the fortune of dub in background music program and the other softwares or program of video file Row.

In the equipment of dubbing in background music of video file shown in Fig. 1, user interface 1003 is mainly used for being counted with each terminal According to communication；Network interface 1004 is mainly used for connecting background server, carries out data communication with background server；And processor 1001 can be used for calling the program of dubbing in background music of the video file stored in memory 1005, and execute following operation:

Further, processor 1001 can be also used for calling the journey of dubbing in background music of the video file stored in memory 1005 Sequence, and execute following steps:

The video features further include: video feeling score feature,

Further, processor 1001 can be also used for calling the journey of dubbing in background music of the video file stored in memory 1005 Sequence, it is described every video features are input to it is default dub in background music model the step of before, execute following steps:

Based on above-mentioned structure, each embodiment of the method for dubbing in background music of video file of the present invention is proposed.

Referring to figure 2., Fig. 2 is the flow diagram of the method first embodiment of dubbing in background music of video file of the present invention.

The embodiment of the invention provides the embodiments of the method for dubbing in background music of video file, it should be noted that although in process Logical order is shown in figure, but in some cases, it can be to be different from shown or described by sequence execution herein Step.

The method of dubbing in background music of video file of the embodiment of the present invention is applied to the equipment of dubbing in background music of video file, view of the embodiment of the present invention The equipment of dubbing in background music of frequency file can be PC, and the terminal devices such as portable computer are not particularly limited herein.

The method of dubbing in background music of the present embodiment video file includes:

Step S100, every video that the initial video file is extracted from initial video file to be dubbed in background music are special It levies, and generates the soundtrack audio file of the initial video file in conjunction with every video features.

When detect start to play initial video file when, call preset algorithm and predetermined sequence neural network model, Every video features are extracted from currently playing initial video file, and the every video features extracted are sent to base In the default model of dubbing in background music that series neural network is dubbed in background music automatically, by presetting models coupling currently every video features of dubbing in background music, According to the broadcasting timing of current initial video file, the soundtrack audio file of the initial video file is sequentially generated.

In the present embodiment, video file be specifically as follows advertiser designer will be in video according to customer demand Hold the advertisement video that production finishes；Each preset algorithm and predetermined sequence neural network model are specifically as follows Gunnar Farneback optical flow algorithm, chroma histogram algorithm, shot border detection model and visual classification training prediction model；In advance The happy model of establishing is specifically as follows the model of dubbing in background music automatically based on series neural network.

Specifically, for example, in the present embodiment, passing through the above-mentioned each preset algorithm of calling and predetermined sequence neural network mould Type, after extracting every video features in the advertisement video content being currently played, by every video features be passed to In default model of dubbing in background music based on series neural network, for every video spy of the default models coupling advertisement video of dubbing in background music Sign is dubbed in background music automatically.

Further, referring to figure 3., Fig. 3 is the refinement step schematic diagram of step S100 in Fig. 2, initial view to be dubbed in background music Every video features of frequency file include: light stream strength characteristic, chroma histogram feature, shot boundary characteristic, in step S100 In, the step of extracting every video features of the video file from initial video file to be dubbed in background music includes:

Step S101 extracts each corresponding each light stream figure of video image and the video figure in the initial video file The chroma histogram of picture.

Preset algorithm is called, analysis extracts light corresponding to the currently playing each frame video image of initial video file Chroma histogram corresponding to flow graph and each frame video image.

Specifically, for example, in the present embodiment, advertisement view is being played until terminating from starting to play Current ad video During the entire process of frequency, Gunnar Farneback optical flow algorithm and chroma histogram algorithm are called or called simultaneously respectively, It analyzes one by one and extracts the color in light stream figure and each frame video image corresponding to the video image of Current ad video Spend histogram.

In the present embodiment, Gunnar Farneback optical flow algorithm is called, each frame view of Current ad video is extracted The dense optical flow of frequency image, and form light stream figure corresponding to video image.

Step S102, the average light intensity of flow of each light stream figure is special as the light stream intensity of the initial video file Sign.

Specifically, for example, in the present embodiment, by calling Gunnar Farneback optical flow algorithm, calculating shape At each frame video image of Current ad video corresponding to light stream figure average light intensity of flow, using the average light intensity of flow as The light stream strength characteristic of Current ad video file.

Step S103, the color after the chroma histogram is normalized, as the initial video file Spend histogram feature.

Specifically, further current by what is extracted by calling chroma histogram algorithm for example, in the present embodiment The chroma histogram of each frame video image of advertisement video is normalized, and by the coloration after being normalized Chroma histogram feature of the histogram vectors as Current ad video file.

Step S104 detects the boundary shot of the video image, by initial video file described in the boundary shot Shot boundary characteristic.

Default shot border detection model is called, is detected in current initial video file, each paragraph variation of video content Situation, and using shot border detection result as the shot boundary characteristic of current initial video file.

Specifically, for example, in the present embodiment, advertisement view is being played until terminating from starting to play Current ad video During the entire process of frequency, default shot border detection model is called, the segmentation situation of change of currently playing advertisement video is detected, And using the testing result of shot border detection model as the shot boundary characteristic of currently playing advertisement video file.

Further, the video features of initial video file to be dubbed in background music further include: video feeling score feature, step In S100, the step of extracting every video features of the video file from initial video file to be dubbed in background music, further includes:

Step S105 reads the video content of the initial video file, detects and counts and identifies in the video content The affection data of video feeling.

The video content for reading and detecting currently playing initial video file, in the video content for identifying video The affection data of emotion is counted.

Specifically, for example, in the present embodiment, reading the video content of currently playing advertisement video, and to the video Video data in content is marked, with according to the label to the emotion score of the video content of Current ad video according to 1 Analyzed and counted to 10 points obtain fractional result (the higher video content for representing Current ad video of score more have passion or Happier, score is lower, and the video content for representing Current ad video is tranquiler).

The affection data is input to default sentiment analysis model by step S106, for the default sentiment analysis mould Type predicts the affection data to obtain the emotion score of the video content.

Specifically, for example, in the present embodiment, after obtaining flag data, which is input to pre- setting video Classification based training prediction model, by calling the visual classification training prediction model based on series neural network, further to current The emotion degree score of advertisement video subsequent time is predicted.

In the present embodiment, default visual classification training prediction model used is specifically as follows TSN (Temporal Segment Network) Behavior-based control identification video classification model, or can be trend prediction (Stream) visual classification Model.

Step S107, using the emotion score as the video feeling score feature of the initial video file.

Specifically, for example, in the present embodiment, in Current ad video that visual classification training prediction model is predicted The emotion score of appearance, the emotion score feature as Current ad video file.

Further, in step S100, in conjunction with every video features generate the initial video file with musical sound The step of frequency file includes:

Every video features are input to default model of dubbing in background music by step S108.

Specifically, for example, in the present embodiment, will be calculated based on Gunnar Farneback optical flow algorithm, chroma histogram The light stream intensity for the Current ad video that method, shot border detection model and visual classification training prediction model extract is special The video features such as sign, chroma histogram feature, shot boundary characteristic and emotion score feature are passed to based on series neural network Default model of dubbing in background music.

In the present embodiment, default model of dubbing in background music used is specifically as follows based on time recursive sequence neural network (LSTM Series neural network) dub in background music model automatically, generate the video in the default every video features of models coupling of dubbing in background music Before the soundtrack audio file of file, this is default dub in background music model by add default training sample to it is described it is default dub in background music model into Row learning training, default training sample includes: audio, video data and pure audio data, by addition training sample to based on sequence The model of dubbing in background music automatically of neural network carries out learning training, dubs in background music model automatically when being dubbed in background music automatically to video file, energy Enough obtain better effect.

Specifically, for example, utilizing audio, video data (MTV) and various pure audio datas two using the method for transfer learning The different sample of class carries out model training to the model of dubbing in background music automatically based on series neural network.According to the generalization of transfer learning Problem definition is trained, using the model knot of coder-decoder using the second class sample (pure audio data) in originating task The music samples of input are mapped to feature space by structure, encoder model, and decoder is again to the insertion feature in feature space (embedding feature) is decoded, and is realized the mapping from feature space to music, is passed through the instruction of encoder and decoder Practice, originating task model obtains the Model Weight from feature space to music；Goal task uses first kind sample (audio-video number According to) be trained, audio, video data is mapped to the feature space in originating task using characteristic extracting module first, then pass through source Embedding feature is mapped to music by the decoder model in task, realizes that study is synchronous with model end to end It updates.

Further, in step S108, every video features are input to before presetting the step of dubbing in background music model, this The method of dubbing in background music of invention video file further include:

In the present embodiment, in order to enable the above-mentioned default model of dubbing in background music based on series neural network preferably to advertisement The soundtrack audio of video file carries out learning training, to the middle lookback feature of generation (i.e. output before 1-2 trifle, Whether a upper output is same with the output phase before 1-2 trifle, and current output is in the position of current trifle) it is detected, and By the lookback feature together with other every video features of current video file, it is input to basic sequence neural network Default model of dubbing in background music, dubs middle repetition and similar melody for the better identification learning current video file of model of dubbing in background music.

Step S109 generates the initial video text in conjunction with every video features in the default model of dubbing in background music The soundtrack audio file of part.

When the default model of dubbing in background music based on series neural network, every view of currently playing initial video file is received After frequency feature and the lookback feature of current initial video file, in conjunction with current every video features, according to current first The broadcasting timing of beginning video file sequentially generates the soundtrack audio file of the initial video file.

Specifically, for example, in this embodiment, when the default model of dubbing in background music based on series neural network, receiving and being based on Gunnar Farneback optical flow algorithm, chroma histogram algorithm, shot border detection model and visual classification training prediction Light stream strength characteristic, chroma histogram feature, shot boundary characteristic and the emotion score for the Current ad video that model extraction goes out After the lookback feature of the video features such as feature and current video file, in conjunction with the advertisement video currently playing moment Every video features and lookback feature and currently playing moment previous moment, the advertisement video generated Soundtrack audio, according to the calculating prediction of series neural network automatically generate the lower playing time of Current ad video with musical sound Frequently, it and according to the broadcasting timing of Current ad video, repeats the above operation of dubbing in background music and sequentially generates soundtrack audio file, until current Advertisement video finishes.

Step S200 is based on the initial video file and soundtrack audio file, generates test video file.

It is special according to current initial video file, and according to the every video features and lookback of the initial video file The play time sequence for levying the soundtrack audio file generated, the initial video file and soundtrack audio file are combined, with Generate the test video file that current initial video file contains audio content.

Further, step S200 includes:

Step S201 reads the play time sequence of the initial video file and the soundtrack audio file.

Specifically, for example, in the present embodiment, reading the play time sequence of currently playing advertisement video file respectively Column, and model is dubbed in background music according to light stream strength characteristic, the color of Current ad video file by default based on series neural network Spend the lookback of the video features such as histogram feature, shot boundary characteristic and emotion score feature and current video file The play time sequence for the soundtrack audio file that feature generates.

Step S202 is based on the play time sequence, and the initial video file and the soundtrack audio file are closed As test video file.

Specifically, for example, in the present embodiment, according to the play time sequence of the currently playing advertisement video file of reading Column, and the play time sequence with soundtrack audio file corresponding to the play time sequence, by current soundtrack audio file group It is bonded in Current ad video file, to generate the test video file that Current ad video file contains audio content.

Step S300 watches the user's portrait model and evaluation parameter of object according to the test video file, to described Soundtrack audio file is modified in test video file, generates stand-by video file.

On the release platform of current initial video file, the audient user of current initial video file is watched in detection, and When obtaining the portrait model of audient user from the platform and watching current test video file, to the test video file Evaluation parameter calls default recommended models, and the every user portrait model and evaluation parameter that will acquire are input to the recommended models, Preference when current video file, which is predicted, is watched to audient user, and default model of dubbing in background music is carried out according to the prediction result Optimization to instruct the default model of dubbing in background music to be modified the soundtrack audio file of generation, and ultimately generates current test video The stand-by video file of file.

Further, propose that the present invention is based on the characteristic analysis method second embodiments of machine learning model.

Referring to figure 4., Fig. 4 is the flow diagram of the method second embodiment of dubbing in background music of video file of the present invention, based on upper State the method first embodiment of dubbing in background music of video file, in the present embodiment, above-mentioned steps S109, in the default model of dubbing in background music, The step of generating the soundtrack audio file of the initial video file in conjunction with every video features include:

Step S1091 is raw according to every video features of the initial video file and the lookback feature At sequence of notes.

Preset algorithm and predetermined sequence neural network model will be being called, mentioned from currently playing initial video file Every video features are taken out, and the lookback feature of the current initial video file detected is input to based on series neural network Default model of dubbing in background music in after, default dub in background music the models coupling items video features and lookback feature firstly generate It dubs in background music sequence of notes.

Specifically, for example, in the present embodiment, default based on series neural network dub in background music model receive it is current Light stream strength characteristic, chroma histogram feature, shot boundary characteristic and the emotion score feature and Current ad of advertisement video After the lookback feature of video, based on LSTM series neural network in each playing time t, the items of input time point t Video features and Lookback feature, i.e. light stream strength characteristic, chroma histogram feature, shot boundary characteristic and emotion score are special Sign and the note of time point t-1 (i.e. currently playing moment previous moment) output, and LSTM series neural network is exported every One time point was the output note probability distribution of note selection, and taking a note of maximum probability is current note.

In the present embodiment, to simplify dub in background music automatically model and effect of optimization, the range for exporting note is limited to C3-C6 Between 3 octaves, i.e. 36 notes, finally, the output of model are the probability distribution of 37 dimensions, represent+1 sky of 36 notes White position (i.e. this moment does not have note).

The sequence of notes is inputted note duration sequence neural network by step S202, for the note duration nerve Network exports note duration sequence according to the sequence of notes and the lookback feature.

In the present embodiment, every video features of default models coupling current video file of dubbing in background music and lookback are special The sequence of notes that sign generates is input to the note duration sequence neural network that current preset is dubbed in background music in model, by sound as input It accords with duration neural network and combines lookback feature corresponding to the sequence of notes and each video file playing time, output is current The note duration sequence of sequence of notes.

The sequence of notes is inputted drum sequence neural network, for the drum sequence neural network by step S203 Drumbeat combination is exported according to the sequence of notes.

In the present embodiment, every video features of default models coupling current video file of dubbing in background music and sequence of notes are made For input, it is input to the drum sequence neural network that current preset is dubbed in background music in model, by drum sequence neural network according to input Sequence of notes, in each trifle of sequence of notes, according to the drum of the sequence of notes of current trifle and the previous trifle of current trifle Point combination from selection in the drumbeat integrated mode (pattern) for having current drum sequence neural network and exports current trifle Drumbeat combination.

Step S204 combines according to the sequence of notes, note duration sequence and the drumbeat, generates the initial video The soundtrack audio file of file.

According to the play time sequence of presently described initial video file, by the default mould of dubbing in background music based on series neural network The sound of sequence of notes, each sequence of notes that every video features and lookback feature of the type based on current video file generate The drumbeat combination for according with duration sequence and each sequence of notes, synthesizes the soundtrack audio file of presently described initial video file.

The sequence of notes is inputted note duration sequence neural network by the present invention, for the note duration neural network Note duration sequence is exported according to the sequence of notes and the lookback feature；The sequence of notes is inputted into drum sequence Neural network, so that the drum sequence neural network exports drumbeat combination according to the sequence of notes；According to the note sequence Column, note duration sequence and drumbeat combination, generate the soundtrack audio file of the initial video file；As a result, with video Based on the video content of file, call every video features of mature series neural network combination video file, it is layer-by-layer, according to Sequence automatically generates the soundtrack audio file of current video file, reduces previous advertisement video and dubs in background music the overall cost of production, and And the total quality that advertisement video is dubbed in background music is improved, make soundtrack audio that there is the good result organically combined with video features, from And it is provided with the audience of advertisement video and preferably watches experience.

Further, the method 3rd embodiment of dubbing in background music of video file of the present invention is proposed.

Referring to figure 5., Fig. 5 is the flow diagram of the method 3rd embodiment of dubbing in background music of video file of the present invention, based on upper State dub in background music method first embodiment and the second embodiment of video file, in the present embodiment, step S300 is regarded according to the test Frequency file watches the user's portrait model and evaluation parameter of object, repairs to soundtrack audio file in the test video file Just, the step of generating stand-by video file include:

Step S301 detects the release platform of the test video file, the test is obtained from the release platform Video file watches the user's portrait model and evaluation parameter of object.

Specifically, for example, in the present embodiment, detecting release platform-DSP (party in request's platform) of Current ad video, The audient user of Current ad video is watched in detection on the DSP, and the user for extracting part audient user draw a portrait model with And when watching the test video file of Current ad, evaluation parameter of the audient user to the test video file.

In the present embodiment, user's portrait model includes: age, gender, region and client type etc., Shou Zhongyong Family includes: click, duration, time, drumbeat type and the style of dubbing in background music for playing video to the evaluation parameter of the test video file Deng.

Step S302, each user for reading same subscriber portrait model watch the test video file in predetermined period Evaluation parameter, and according to the evaluation parameter construct user behavior characteristics sequence.

In the present embodiment, default recommended models are called, every user is drawn a portrait into mode input to the recommended models, with right Audient user watches that preference when current test video file is predicted, and according to the prediction result to it is default dub in background music model into Row optimization to instruct the default model of dubbing in background music to be modified the soundtrack audio file of generation, and ultimately generates current test view The stand-by video file of frequency file.

Specifically, for example, in the present embodiment, default recommended models are specifically as follows session-based (to service week Based on phase) recommended models, in session-based recommended models, read have same age, gender, region or A kind of audient user of client type etc. portrait model, as within 1 to 2 week, receives within the certain predetermined time cycle It sees every evaluation parameter when Current ad video, such as clicks, plays duration, time, drumbeat type and the wind of dubbing in background music of video Lattice etc. construct current a kind of identical portrait model audient by each behavioral data according to the chronological order in 1 to 2 week The user behavior characteristics sequence of user.

Step S303, when watching the test video file according to user behavior characteristics sequence calculating user, to institute State the preference probability distribution data of soundtrack audio file.

Specifically, for example, in the present embodiment, using the user behavior characteristics sequence of building as input, being input to current In the series neural network of session-based recommended models, and the output result of the series neural network state layer is passed to To layer is connected entirely, connects layer entirely to the audient user of current a kind of same alike result data in the series neural network, work as currently watching The test video subsequent time of preceding advertisement predicts the preference probability distribution data of soundtrack audio style, and final output The preference probability distribution data.

Step S304 generates the default mould of dubbing in background music of the soundtrack audio file with preference probability distribution data guidance Type generates stand-by video file to be modified to soundtrack audio file in the test video file.

Connect the preference probability distribution number of layer output entirely according to the series neural network of current session-based recommended models It is predicted that as a result, guidance optimization is carried out to the default model of dubbing in background music for being currently based on series neural network, so as to the default mould of dubbing in background music Type is modified the soundtrack audio file of currently playing test video file, ultimately generate the test video file to Use video file.

Specifically, for example, in the present embodiment, during playing Current ad video, when based on LSTM sequence mind Drum sequence neural network in model of dubbing in background music automatically through network, the drum for selecting Current ad video to dub in background music according to sequence of notes When point integrated mode, by the drumbeat combined prediction result and session- of the current trifle of drum sequence neural network prediction The preference probability distribution data prediction result that the series neural network of based recommended models connects layer output entirely is weighted, with choosing The drumbeat combination for being more in line with the audient's user preference for watching Current ad video is selected out, and is finally more in line with and is watched according to this The drumbeat of audient's user preference of Current ad video combines, generate Current ad video be more in line with audient user with musical sound Frequency file, and then combine to form the final stand-by video file of the advertisement with initial ad video file.

The present invention detects the release platform of the test video file, and the test video is obtained from the release platform File watches the user's portrait model and evaluation parameter of object；Each user of same subscriber portrait model is read in predetermined period It watches the evaluation parameter of the test video file, and user behavior characteristics sequence is constructed according to the evaluation parameter；According to institute It states user behavior characteristics sequence and calculates user when watching the test video file, to the preference probability of the soundtrack audio file Distributed data；With the preference probability distribution data guidance default model of dubbing in background music, with to matching in the test video file Musical sound frequency file is modified, and generates stand-by video file；Feedback as a result, based on the advertisement video audience to dub in background music into Row is optimized and revised, and is met preference requirement of the user for content of dubbing in background music, is further improved user for the advertisement video Watch experience.

In addition, the embodiment of the present invention also proposes a kind of scoring system of video file, the scoring system of the video file Include:

Preferably, the scoring system of the video file further include:

Video file as described above is realized when the scoring system modules operation for the video file that the present embodiment proposes Dub in background music method the step of, details are not described herein.

In addition, the embodiment of the present invention also proposes a kind of storage medium, it is applied to computer, i.e., the described storage medium is to calculate Machine readable storage medium storing program for executing, the program of dubbing in background music of video file is stored on the medium, and the program of dubbing in background music of the video file is located Reason device execute when realize video file as described above dub in background music method the step of.

Wherein, the program of dubbing in background music of the video file run on the processor is performed realized method and can refer to The present invention is based on each embodiments of method of dubbing in background music of video file, and details are not described herein again.

It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or device.

The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, computer, clothes Business device, air conditioner or the network equipment etc.) execute method described in each embodiment of the present invention.

The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims

1. a kind of method of dubbing in background music of video file, which is characterized in that the video file dub in background music method the following steps are included:

Every video features of the initial video file are extracted from initial video file to be dubbed in background music, and combine every institute State the soundtrack audio file that video features generate the initial video file；

The user's portrait model and evaluation parameter that object is watched according to the test video file, in the test video file Soundtrack audio file is modified, and generates stand-by video file.

2. the method for dubbing in background music of video file as described in claim 1, which is characterized in that the video features include: that light stream is strong Feature, chroma histogram feature, shot boundary characteristic are spent,

Described the step of extracting every video features of the video file from initial video file to be dubbed in background music includes:

Extract the coloration histogram of each video image corresponding each light stream figure and the video image in the initial video file Figure；

Chroma histogram feature after the chroma histogram is normalized, as the initial video file；

The boundary shot for detecting the video image, by the shot boundary characteristic of initial video file described in the boundary shot.

3. the method for dubbing in background music of video file as described in claim 1, which is characterized in that the video features further include: video Emotion score feature,

Described the step of extracting every video features of the video file from initial video file to be dubbed in background music further include:

The video content for reading the initial video file detects and counts the emotion for identifying video feeling in the video content Data；

The affection data is input to default sentiment analysis model, so that the default sentiment analysis model is to the emotion number According to being predicted to obtain the emotion score of the video content；

4. the method for dubbing in background music of video file as described in any one of claims 1 to 3, which is characterized in that in conjunction with every view Frequency feature generates the step of soundtrack audio file of the initial video file and includes:

Every video features are input to default model of dubbing in background music, the default trained sample that the preset configuration model passes through addition This progress learning training, the default training sample includes: audio, video data and pure audio data；

In the default model of dubbing in background music, the soundtrack audio text of the initial video file is generated in conjunction with every video features Part.

5. the method for dubbing in background music of video file as claimed in claim 4, which is characterized in that described by every video features It is input to before presetting the step of dubbing in background music model, the method also includes:

The lookback feature of the initial video file is detected, and the lookback feature is input to described preset and is dubbed in background music Model.

6. the method for dubbing in background music of video file as claimed in claim 4, which is characterized in that the default model of dubbing in background music is based on sequence Column neural network generates the model of dubbing in background music of audio file,

In the default model of dubbing in background music, the soundtrack audio text of the initial video file is generated in conjunction with every video features The step of part includes:

According to every video features of the initial video file and the lookback feature, sequence of notes is generated；

The sequence of notes is inputted into note duration sequence neural network, so that the note duration neural network is according to the sound It accords with sequence and the lookback feature exports note duration sequence；

The sequence of notes is inputted into drum sequence neural network, so that the drum sequence neural network is according to the note sequence Column output drumbeat combination；

It is combined according to the sequence of notes, note duration sequence and the drumbeat, generate the initial video file matches musical sound Frequency file.

7. the method for dubbing in background music of video file as described in claim 1, which is characterized in that based on the initial video file and match Musical sound frequency file, generate test video file the step of include:

It is test video text by the initial video file and the soundtrack audio file synthesis based on the play time sequence Part.

8. the method for dubbing in background music of video file as described in claim 1, which is characterized in that described according to the test video file The user's portrait model and evaluation parameter for watching object, are modified soundtrack audio file in the test video file, raw Include: at the step of stand-by video file

The release platform for detecting the test video file obtains the test video file from the release platform and watches pair The user's portrait model and evaluation parameter of elephant；

Each user for reading same subscriber portrait model watches the evaluation parameter of the test video file in predetermined period, and User behavior characteristics sequence is constructed according to the evaluation parameter；

When watching the test video file according to user behavior characteristics sequence calculating user, to the soundtrack audio file Preference probability distribution data；

The default model of dubbing in background music that the soundtrack audio file is generated with preference probability distribution data guidance, to the test Soundtrack audio file is modified in video file, generates stand-by video file.

9. a kind of scoring system of video file, which is characterized in that the scoring system of the video file is based on sequential nerve net Network generates the soundtrack audio of video file, and the scoring system of the video file includes:

Soundtrack audio generation module, for extracting the items of the initial video file from initial video file to be dubbed in background music Video features, and generate in conjunction with every video features the soundtrack audio file of the initial video file；

Video generation module to be measured generates test video file for being based on the initial video file and soundtrack audio file；

Soundtrack audio correction module, for watching the user's portrait model and evaluation ginseng of object according to the test video file Number, is modified soundtrack audio file in the test video file, generates stand-by video file.

10. the scoring system of video file as claimed in claim 9, which is characterized in that the scoring system of the video file Further include:

Learning training module carries out the default model of dubbing in background music for generating the soundtrack audio file for adding default training sample Learning training, the default training sample includes: audio, video data and pure audio data.

11. a kind of equipment of dubbing in background music of video file, which is characterized in that the equipment of dubbing in background music of the video file includes: memory, place Reason device and the program of dubbing in background music for being stored in the video file that can be run on the memory and on the processor, the video text Dubbing in background music such as video file described in any item of the claim 1 to 8 is realized when the program of dubbing in background music of part is executed by the processor The step of method.

12. a kind of storage medium, which is characterized in that be applied to computer, be stored with matching for video file on the storage medium Happy program realizes such as view described in any item of the claim 1 to 8 when the program of dubbing in background music of the video file is executed by processor Frequency file dub in background music method the step of.