Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to
Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can be using the application for generating the method for model or the example of the device for generating model
Property system architecture 100.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out
Send message etc..Various telecommunication customer end applications can be installed, such as video record class is answered on terminal device 101,102,103
With the application of, video playback class, the application of interactive voice class, searching class application, instant messaging tools, mailbox client, social platform
Software etc..
Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard
When part, it can be the various electronic equipments with display screen, including but not limited to smart phone, tablet computer, on knee portable
Computer and desktop computer etc..When terminal device 101,102,103 is software, above-mentioned cited electricity may be mounted at
In sub- equipment.Multiple softwares or software module (such as providing Distributed Services) may be implemented into it, also may be implemented into
Single software or software module.It is not specifically limited herein.
When terminal device 101,102,103 is hardware, it is also equipped with image capture device thereon.Image Acquisition is set
It is standby to can be the various equipment for being able to achieve acquisition image function, such as camera, sensor.User can use terminal device
101, the image capture device on 102,103, to acquire video.
Server 105 can be to provide the server of various services, such as uploading to terminal device 101,102,103
The video video processing service device that is stored, managed or analyzed.The available sample set of video processing service device.Sample
Concentration may include a large amount of sample.Wherein, the sample in above-mentioned sample set may include the first video, the second video and third
Video.Dubbing in background music for first video and the second video is identical and with identical style mark of dubbing in background music.First video and third video
Dub in background music it is different and with different style marks of dubbing in background music.In addition, video processing service device can use the sample in sample set,
Initial model is trained, and training result (such as the video feature extraction model generated) can be stored.In this way,
After user utilizes 101,102,103 uploaded videos of terminal device, server 105 can determine the feature for the video that user is uploaded
Information can carry out the operations such as selection and the push of soundtrack information in turn to the video.
It should be noted that server 105 can be hardware, it is also possible to software.When server is hardware, Ke Yishi
The distributed server cluster of ready-made multiple server compositions, also may be implemented into individual server.When server is software,
Multiple softwares or software module (such as providing Distributed Services) may be implemented into, single software or soft also may be implemented into
Part module.It is not specifically limited herein.
It should be noted that the method provided by the embodiment of the present application for generating model is generally held by server 105
Row, correspondingly, the device for generating model is generally positioned in server 105.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, the process of one embodiment of the method for generating model according to the application is shown
200.The method for being used to generate model, comprising the following steps:
Step 201, sample set is obtained.
It in the present embodiment, can be with for generating the executing subject (such as server 105 shown in FIG. 1) of the method for model
Obtain sample set in several ways.For example, executing subject can by wired connection mode or radio connection, from
It is obtained in another server (such as database server) of storage sample and is stored in existing sample set therein.Example again
Such as, user can collect sample by terminal device (such as terminal device shown in FIG. 1 101,102,103).In this way, above-mentioned
Executing subject can receive sample collected by terminal, and these samples are stored in local, to generate sample set.It needs to refer to
Out, above-mentioned radio connection can include but is not limited to 3G/4G connection, WiFi connection, bluetooth connection, WiMAX connection,
Zigbee connection, UWB (ultra wideband) connection and other currently known or exploitation in the future radio connections.
It herein, may include a large amount of sample in sample set.Wherein, sample may include the first video, the second video and
Third video.It should be noted that dubbing in background music for the first video and the second video is identical and with identical style mark of dubbing in background music.The
Dubbing in background music for one video and third video is different and with different style marks of dubbing in background music.Style of dubbing in background music mark, which can be, to be used to indicate
With the information for distinguishing the style dubbed in background music.Style of dubbing in background music can be divided into advance it is a variety of such as sad, cheerful and light-hearted, releive.
In some optional implementations of the present embodiment, the sample in above-mentioned sample set can give birth to as follows
At: it is possible, firstly, to extract video at random from preset video library as the first video, wherein the video in above-mentioned video library
With mark and the style mark of dubbing in background music of dubbing in background music.In practice, mark of dubbing in background music be can serve to indicate that and differentiation is dubbed in background music.For example, mark of dubbing in background music
Note can be the title dubbed in background music.Then, it can be extracted at random from above-mentioned video library and above-mentioned first video is having the same matches
The video of happy mark and style mark having the same of dubbing in background music is as the second video.It then, can be random from above-mentioned video library
The video marked from above-mentioned first video with different dubbing in background music and with different style marks of dubbing in background music is chosen to regard as third
Frequently.Finally, can be sample by above-mentioned first video, above-mentioned second video, above-mentioned third video summary.
It should be noted that the sample in above-mentioned sample can also be generated using other modes, such as the side such as artificial selection
Formula, details are not described herein again.
Step 202, sample is extracted from sample set.
In the present embodiment, sample is extracted in the sample set that executing subject can be obtained from step 201, and executes step
Rapid 203 to step 206 training step.Wherein, the extracting mode of sample and extraction quantity are not intended to limit in this application.Example
Such as, it can be and extract at least one sample at random, be also possible to therefrom to extract the clarity of the video of sample preferably (i.e. in sample
Video frame pixel it is higher) sample.
Step 203, the frame in the video in extracted sample is input to initial model, obtains each video in sample
Characteristic information.
In the present embodiment, above-mentioned executing subject can input the frame in the video in the sample extracted in step 202
To initial model.Due to including the first video, the second video, third video in extracted sample, therefore, it is possible to respectively obtain
The characteristic information of first video, the characteristic information of the second video, third video characteristic information.Initial model can be by view
Frame in frequency is analyzed etc., and the characteristic information of video is exported.In practice, characteristic information can be in the form of vector or matrix
To indicate.
It should be noted that the frame in the video inputted, can be the frame randomly selected or multiframe;Be also possible to by
The multiframe extracted from video according to specified time interval (such as 1s or 2s etc.).It is not construed as limiting herein.
In the present embodiment, initial model can be created based on machine learning techniques various and mention with characteristics of image
Take the model of function.Initial model can to the frame in video carry out feature extraction, then to the feature of extracted each frame into
The processing such as row fusion, analysis, the characteristic information of final output video.
As an example, initial model can be using various existing structures (such as DenseBox, VGGNet, ResNet,
SegNet etc.) convolutional neural networks.In practice, convolutional neural networks (Convolutional Neural Network, CNN)
It is a kind of feedforward neural network, its artificial neuron can respond the surrounding cells in a part of coverage area, for image
Processing has outstanding performance, therefore, it is possible to carry out the extraction of the feature of the frame in video using convolutional neural networks.
In this example, the product neural network established may include convolutional layer, pond layer, Fusion Features layer, full connection
Layer etc..Wherein, convolutional layer can be used for extracting characteristics of image.Pond layer can be used for carrying out the information of input down-sampled
(downsample).Fusion Features layer can be used for the corresponding characteristics of image of obtained each frame (for example, it may be feature square
The form of battle array or the form of feature vector) it is merged.For example, can will be identical in the corresponding eigenmatrix of different frame
The characteristic value of position is averaged, and to carry out Fusion Features, generates a fused eigenmatrix.Full articulamentum can be used for pair
Fused feature is further processed.
It should be noted that above-mentioned initial model is also possible to other models with image characteristics extraction function, not
It is limited to above-mentioned example, specific model structure is not construed as limiting herein.
Step 204, based on the style mark of dubbing in background music in obtained characteristic information and sample, the penalty values of sample are determined.
In the present embodiment, the target of training initial model is to be allowed to identical and with identical style mark of dubbing in background music from dubbing in background music
The difference of the characteristic information extracted in the frame of the video of note is as small as possible, meanwhile, from dubbing in background music different and dub in background music style with difference
The difference of the characteristic information extracted in the frame of the video of mark is as big as possible.Thus, it is possible to will be identical and with identical from dubbing in background music
Style mark of dubbing in background music video frame in the difference of characteristic information extracted be known as the first difference.Can will from dub in background music it is different and
The difference of the characteristic information extracted in the frame of video with different style marks of dubbing in background music is known as the second difference.Above-mentioned executing subject
The value of the first difference and the second difference can be determined according to characteristic information first.In practice, due in extracted sample
First video and the second video are dubbed in background music identical and are marked with identical style of dubbing in background music, and therefore, the first difference, which can be, to be extracted
Sample in the first video characteristic information and the second video characteristic information difference.Due in extracted sample
One video is dubbed in background music different and is marked with different styles of dubbing in background music, the difference of dubbing in background music of the second video and third video from third video
And with different style marks of dubbing in background music.Therefore, the second difference can be the feature letter of the first video in extracted sample
The difference of breath and the characteristic information of third video, alternatively, the feature of the characteristic information and third video that can be the second video is believed
The difference of breath.Herein, the difference of characteristic information can use the modes such as Euclidean distance, cosine similarity to determine.Characteristic information
More similar, difference is smaller.Characteristic information is more dissimilar, and difference is bigger.
After obtaining the first difference and the second difference, above-mentioned executing subject can be inputted the first difference, the second difference
To the loss function pre-established, the penalty values of sample are determined.Loss function is a non-negative real-valued function.Under normal circumstances,
The value (penalty values) of loss function is smaller, and the robustness of model is better.Loss function can be arranged according to actual needs.Make
For example, above-mentioned loss function can be the function of the difference degree for characterizing the second difference and the first difference, such as ternary
Group loss function (triplet loss).
In some optional implementations of the present embodiment, above-mentioned executing subject can be determined as follows and be mentioned
The penalty values of the sample taken:
The first step determines the Euclidean distance of the characteristic information of the video with identical style mark of dubbing in background music, by the distance
As the first Euclidean distance.And determine the Euclidean distance of the characteristic information of the video of different style marks of dubbing in background music, by this away from
From as the second Euclidean distance.
Second step determines the difference of above-mentioned second Euclidean distance Yu above-mentioned first Euclidean distance.
Third step determines the penalty values of sample based on above-mentioned difference compared with the first default value, wherein above-mentioned
One default value is positive number (such as 0.2).Herein, the first default value can be technical staff by mass data statistics and based on
It calculates and preassigned numerical value.
Optionally, it is greater than above-mentioned first in response to the difference of above-mentioned second Euclidean distance of determination and above-mentioned first Euclidean distance
Second default value (such as 0) can be determined as the penalty values of sample by default value, above-mentioned executing subject.It is understood that
It is, when the difference of above-mentioned second Euclidean distance and above-mentioned first Euclidean distance is greater than above-mentioned first default value, it is believed that
The numerical relation of second Euclidean distance and the first Euclidean distance, which meets, is expected.At this point, penalty values can be set to lesser number, make
The influence that the penalty values of the sample decline gradient is smaller or is not involved in gradient decline.Due to the second Euclidean distance at this time with it is above-mentioned
The difference of first Euclidean distance is greater than above-mentioned first default value, thus the difference and the difference of the first default value (can claim herein
For target value) it is greater than 0, therefore, above-mentioned second default value can be set smaller than to the number of above-mentioned target value, such as 0.
Optionally, in response to above-mentioned second Euclidean distance of determination and above-mentioned first Euclidean distance difference no more than above-mentioned the
The difference of above-mentioned first default value and above-mentioned difference can be determined as the loss of sample by one default value, above-mentioned executing subject
Value.It is understood that when the difference of above-mentioned second Euclidean distance and above-mentioned first Euclidean distance is not more than the first default value
When, it is believed that the numerical relation of the second Euclidean distance and the first Euclidean distance is unsatisfactory for being expected.At this point, due to the first present count
The difference of value and the difference is greater than zero, therefore, the difference of above-mentioned first default value and above-mentioned difference can be determined as to the damage of sample
Mistake value participates in gradient decline.
Step 205, determine whether initial model trains completion based on penalty values.
In the present embodiment, above-mentioned executing subject penalty values based on determined by step 204 determine that initial model is
No training is completed.As an example, above-mentioned executing subject can determine whether penalty values have restrained.When determining penalty values convergence,
It can then determine that initial model at this time has trained completion.As another example, above-mentioned executing subject can be first by penalty values
It is compared with target value.In response to determining that penalty values are less than or equal to target value, the preset quantity executed recently can be counted
In penalty values determined by (such as 100) secondary training step, less than or equal to the quantity institute accounting of the penalty values of above-mentioned target value
Example.When the ratio is greater than preset ratio (such as 95%), it can determine that initial model training is completed.It should be noted that mesh
Scale value can be generally used for indicating the ideal situation of the inconsistent degree between predicted value and true value.That is, when loss
When value is less than or equal to target value, it is believed that predicted value nearly or approximately true value.Target value can be come according to actual needs
Setting.
It should be noted that can then continue to execute step 206 in response to determining that initial model has trained completion.Response
In determining that initial model not complete by training, the parameter in initial model can be updated, from above-mentioned sample based on identified penalty values
This concentration extracts sample again, and the initial model after using undated parameter continues to execute above-mentioned training step as initial model.
Herein, it can use the gradient that back-propagation algorithm acquires penalty values relative to model parameter, then utilize gradient descent algorithm
Based on gradient updating model parameter.It should be noted that above-mentioned back-propagation algorithm, gradient descent algorithm and machine learning side
Method is the well-known technique studied and applied extensively at present, and details are not described herein.It should be pointed out that extracting mode here is at this
It is not also limited in application.Such as in the case where sample is concentrated with great amount of samples, executing subject can be extracted therefrom and is not extracted by
The sample crossed.
Step 206, in response to determining that initial model training is completed, the initial model after training is determined as video features and is mentioned
Modulus type.
In the present embodiment, in response to determine initial model training complete, above-mentioned executing subject can will after training at the beginning of
Beginning model is determined as video feature extraction model.
In some optional implementations of the present embodiment, after training obtains video feature extraction model, response
In receiving target video, the frame in above-mentioned target video is input to the video feature extraction model, obtains above-mentioned target view
The target signature information of frequency;It then, can be by the characteristic information of the video in above-mentioned target signature information and preset video library
Similarity calculation is carried out, according to the sequence of similarity from big to small, the video conduct of preset quantity is chosen from above-mentioned video library
Candidate video;Finally, the soundtrack information of available above-mentioned candidate video, selected soundtrack information is pushed.
With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for generating model of the present embodiment
Figure.In the application scenarios of Fig. 3, in the application scenarios of Fig. 3, model can be installed on terminal device 301 used by a user
Training class application.When user opens the application, and after uploading the store path of sample set or sample set, backstage is provided to the application
The server 302 of support can run the method for generating model, comprising:
It is possible, firstly, to obtain sample set.Wherein, the sample in above-mentioned sample set includes the first video, the second video and the
Three videos, the first video and the second video are dubbed in background music identical and are marked with identical style of dubbing in background music, and the first video and third regard
Frequency is dubbed in background music different and is marked with different styles of dubbing in background music.Later, sample can be extracted from above-mentioned sample set, executed as follows
Training step: pumping frame is carried out to the video in extracted sample.By frame 303, the second video in the first video extracted
In frame 304, the frame 305 in third video be input to initial model 306, obtain the characteristic information of each video in sample.Base
Style mark 307 of dubbing in background music in obtained characteristic information and sample, determines the penalty values 308 of sample.Based on above-mentioned loss
Value determines whether initial model trains completion.In response to determining that initial model training is completed, the initial model after training is determined
For video feature extraction model 309.
By obtaining sample set, sample can be extracted therefrom to carry out the training of initial model.Wherein, in above-mentioned sample set
Sample may include the first video, the second video and third video.First video and the second video are dubbed in background music identical and are had
Identical style mark of dubbing in background music.Dubbing in background music for first video and third video is different and with different style marks of dubbing in background music.In this way,
Frame in video in the sample of extraction is input to initial model, the characteristic information of each video in sample can be obtained.
Later, the penalty values of sample can be determined based on the style mark of dubbing in background music in obtained characteristic information and sample.Finally, can
To determine whether initial model trains completion based on identified penalty values.If initial model training is completed, so that it may will instruct
Initial model after white silk is determined as video feature extraction model.So as to obtain a kind of mould that can be used for extracting video features
Type, and the extracted video features of the model facilitate the automatic selection that video is dubbed in background music.
With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of the method for generating model.The use
In the process 400 for the method for generating model, comprising the following steps:
Step 401, sample set is obtained.
It in the present embodiment, can be with for generating the executing subject (such as server 105 shown in FIG. 1) of the method for model
Obtain sample set.It may include a large amount of sample in sample set.Wherein, sample may include the first video, the second video and
Three videos.It should be noted that dubbing in background music for the first video and the second video is identical and with identical style mark of dubbing in background music.First
Dubbing in background music for video and third video is different and with different style marks of dubbing in background music.Style of dubbing in background music mark can be used to indicate and
Distinguish the information for the style dubbed in background music.Style of dubbing in background music can be divided into advance it is a variety of such as sad, cheerful and light-hearted, releive.
In the present embodiment, the sample in above-mentioned sample set can generate as follows: it is possible, firstly, to from preset
Video is extracted in video library at random as the first video, wherein the video in above-mentioned video library is with mark and the wind of dubbing in background music of dubbing in background music
Case marker note, style of dubbing in background music are noted for the style that instruction is dubbed in background music.Then, it can be extracted at random from above-mentioned video library and above-mentioned the
One video is having the same dub in background music mark and style mark having the same of dubbing in background music video as the second video.It then, can be with
It is randomly selected from above-mentioned video library and is marked from above-mentioned first video with different dubbing in background music and there is different style marks of dubbing in background music
The video of note is as third video.Finally, can be by above-mentioned first video, above-mentioned second video, above-mentioned third video summary
Sample.
Step 402, sample is extracted from sample set.
In the present embodiment, sample is extracted in the sample set that executing subject can be obtained from step 401, and executes step
Rapid 403 to step 408 training step.Wherein, the extracting mode of sample and extraction quantity are not intended to limit in this application.Example
Such as, it can be and extract at least one sample at random, be also possible to therefrom to extract the clarity of the video of sample preferably (i.e. in sample
Video frame pixel it is higher) sample.
Step 403, the frame in the video in extracted sample is input to initial model, obtains each video in sample
Characteristic information.
In the present embodiment, above-mentioned executing subject can input the frame in the video in the sample extracted in step 402
To initial model, the characteristic information of the characteristic information of the first video, the characteristic information of the second video, third video is respectively obtained.
In practice, characteristic information can be indicated in the form of vector or matrix.
In the present embodiment, initial model can be the convolutional neural networks created based on machine learning techniques.Initially
Model can carry out feature extraction to the frame in video, then the processing such as be merged, analyzed to the feature of extracted each frame,
The characteristic information of final output video.Herein, the product neural network established may include convolutional layer, pond layer, Fusion Features
Layer, full articulamentum etc..
Step 404, the first Euclidean distance and the second Euclidean distance are determined.
In the present embodiment, above-mentioned executing subject can determine the first Euclidean distance and the second Euclidean distance.Wherein, above-mentioned
First Euclidean distance be with it is identical dub in background music style mark video characteristic information Euclidean distance, above-mentioned second Euclidean away from
Euclidean distance from the characteristic information for the video with different style marks of dubbing in background music.
Herein, dubbing in background music identical and dub in background music with identical due to the first video and the second video in extracted sample
Style mark, therefore, the first difference can be the characteristic information of the first video in extracted sample and the spy of the second video
The difference of reference breath.Dubbing in background music different and dub in background music with different due to the first video and the third video in extracted sample
Style mark, the second video is from third video with different style marks of dubbing in background music.Therefore, the second difference can be extracted
The difference of the characteristic information of the characteristic information and third video of the first video in sample, alternatively, the characteristic information of the second video
With the difference of the characteristic information of third video.
Step 405, the difference of the second Euclidean distance Yu the first Euclidean distance is determined.
In the present embodiment, above-mentioned executing subject can determine above-mentioned second Euclidean distance and above-mentioned first Euclidean distance
Difference.
Step 406, based on above-mentioned difference compared with the first default value, the penalty values of sample are determined.
In the present embodiment, above-mentioned executing subject can determine sample based on above-mentioned difference compared with the first default value
This penalty values.Wherein, above-mentioned first default value is positive number (such as 0.2).Herein, the first default value can be technology people
Member is counted and is calculated and preassigned numerical value based on mass data.
Specifically, it is greater than above-mentioned first in response to the difference of above-mentioned second Euclidean distance of determination and above-mentioned first Euclidean distance
Second default value (such as 0) can be determined as the penalty values of sample by default value, above-mentioned executing subject.It is understood that
It is, when the difference of above-mentioned second Euclidean distance and above-mentioned first Euclidean distance is greater than above-mentioned first default value, it is believed that
The numerical relation of second Euclidean distance and the first Euclidean distance, which meets, is expected.At this point, penalty values can be set to lesser numerical value,
The influence for declining the penalty values of the sample to gradient is smaller or is not involved in gradient decline.Due to the second Euclidean distance at this time with it is upper
The difference for stating the first Euclidean distance is greater than above-mentioned first default value, thus the difference and the difference of the first default value (herein may be used
Referred to as target value) it is greater than 0, therefore, above-mentioned second default value can be set smaller than to the number of above-mentioned target value, such as 0.
It is default no more than above-mentioned first in response to the difference of above-mentioned second Euclidean distance of determination and above-mentioned first Euclidean distance
The difference of above-mentioned first default value and above-mentioned difference can be determined as the penalty values of sample by numerical value, above-mentioned executing subject.It can be with
Understand, it, can be with when the difference of above-mentioned second Euclidean distance and above-mentioned first Euclidean distance is not more than the first default value
Think that the numerical relation of the second Euclidean distance and the first Euclidean distance is unsatisfactory for being expected.At this point, due to the first default value and being somebody's turn to do
The difference of difference is greater than zero, therefore, the difference of above-mentioned first default value and above-mentioned difference can be determined as to the penalty values of sample, ginseng
Decline with gradient.
Step 407, determine whether initial model trains completion based on penalty values.
In the present embodiment, above-mentioned executing subject penalty values based on determined by step 406 determine that initial model is
No training is completed.As an example, above-mentioned executing subject can determine whether penalty values have restrained.When determining penalty values convergence,
It can then determine that initial model at this time has trained completion.
It should be noted that can then continue to execute step 408 in response to determining that initial model has trained completion.Response
In determining that initial model not complete by training, the parameter in initial model can be updated, from above-mentioned sample based on identified penalty values
This concentration extracts sample again, and the initial model after using undated parameter continues to execute above-mentioned training step as initial model.
Herein, it can use the gradient that back-propagation algorithm acquires penalty values relative to model parameter, then utilize gradient descent algorithm
Based on gradient updating model parameter.It should be noted that above-mentioned back-propagation algorithm, gradient descent algorithm and machine learning side
Method is the well-known technique studied and applied extensively at present, and details are not described herein.It should be pointed out that extracting mode here is at this
It is not also limited in application.Such as in the case where sample is concentrated with great amount of samples, executing subject can be extracted therefrom and is not extracted by
The sample crossed.
Step 408, in response to determining that initial model training is completed, the initial model after training is determined as video features and is mentioned
Modulus type.
In the present embodiment, in response to determine initial model training complete, above-mentioned executing subject can will after training at the beginning of
Beginning model is determined as video feature extraction model.
Figure 4, it is seen that the method for generating model compared with the corresponding embodiment of Fig. 2, in the present embodiment
Process 400 relate to determine extracted sample penalty values a kind of mode.The scheme of the present embodiment description can be with as a result,
The difference for the characteristic information for extracting model from the frame for dubbing in background music video identical and with identical style mark of dubbing in background music to the greatest extent may be used
Can be small, meanwhile, the difference of the characteristic information extracted from the frame for dubbing in background music different and style mark of dubbing in background music with difference video is most
It may be big.Thereby, it is possible to obtain a kind of model that can be used for extracting video features, and the extracted video features of the model have
Help the automatic selection that video is dubbed in background music.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind for generating mould
One embodiment of the device of type, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer
For in various electronic equipments.
As shown in figure 5, being used to generate the device 500 of model described in the present embodiment includes: acquiring unit 501, it is configured
At acquisition sample set, wherein the sample in above-mentioned sample set includes the first video, the second video and third video, the first video
With the second video dub in background music it is identical and with it is identical dub in background music style mark, the different and band of dubbing in background music of the first video and third video
There is different style marks of dubbing in background music;Training unit 502 is configured to extract sample from above-mentioned sample set, executes following training
Step: being input to initial model for the frame in the video in extracted sample, obtains the characteristic information of each video in sample;
Based on the style mark of dubbing in background music in obtained characteristic information and sample, the penalty values of sample are determined;It is true based on above-mentioned penalty values
Determine whether initial model trains completion;In response to determining that initial model training is completed, the initial model after training is determined as regarding
Frequency Feature Selection Model.
In some optional implementations of the present embodiment, above-mentioned training unit 502 can be further configured to really
Fixed first Euclidean distance and the second Euclidean distance, wherein above-mentioned first Euclidean distance is with identical style mark of dubbing in background music
The Euclidean distance of the characteristic information of video, above-mentioned second Euclidean distance are the feature of the video with different style marks of dubbing in background music
The Euclidean distance of information;Determine the difference of above-mentioned second Euclidean distance Yu above-mentioned first Euclidean distance;Based on above-mentioned difference and the
The comparison of one default value determines the penalty values of sample, wherein above-mentioned first default value is positive number.
In some optional implementations of the present embodiment, above-mentioned training unit 502 can be further configured to: be rung
The first default value should be greater than in determining above-mentioned difference, the second default value be determined as to the penalty values of sample, wherein above-mentioned the
Two default values are less than the difference of above-mentioned difference and above-mentioned first default value.
In some optional implementations of the present embodiment, above-mentioned training unit 502 can be further configured to: be rung
, no more than the first default value, the difference of above-mentioned first default value and above-mentioned difference should be determined as sample in determining above-mentioned difference
Penalty values.
In some optional implementations of the present embodiment, the sample in above-mentioned sample set generates as follows:
Video is extracted at random from preset video library as the first video, wherein the video in above-mentioned video library is with mark of dubbing in background music
With style mark of dubbing in background music, style of dubbing in background music is noted for the style that instruction is dubbed in background music;It is extracted at random and above-mentioned the from above-mentioned video library
One video is having the same dub in background music mark and style mark having the same of dubbing in background music video as the second video;From above-mentioned video
The video marked from above-mentioned first video with different dubbing in background music and with different style marks of dubbing in background music is randomly selected in library to make
For third video;It is sample by above-mentioned first video, above-mentioned second video, above-mentioned third video summary.
In some optional implementations of the present embodiment, which can also include that updating unit (does not show in figure
Out).Wherein, above-mentioned updating unit may be configured in response to determining that initial model not complete by training, is based on above-mentioned penalty values,
Update initial model in parameter, extract sample again from above-mentioned sample set, the initial model after using undated parameter as
Initial model continues to execute above-mentioned training step.
The device provided by the above embodiment of the application obtains sample set by acquiring unit 501, can therefrom extract sample
This is to carry out the training of initial model.Wherein, the sample in above-mentioned sample set may include the first video, the second video and third
Video.Dubbing in background music for first video and the second video is identical and with identical style mark of dubbing in background music.First video and third video
Dub in background music it is different and with different style marks of dubbing in background music.In this way, training unit 502 is by the frame in the video in the sample of extraction
It is input to initial model, the characteristic information of each video in sample can be obtained.Later, can be believed based on obtained feature
Style mark of dubbing in background music in breath and sample, determines the penalty values of sample.Finally, can be determined based on identified penalty values initial
Whether model trains completion.If initial model training is completed, so that it may which the initial model after training is determined as video features
Extract model.So as to obtain a kind of model that can be used for extracting video features, and the extracted video features of the model
Facilitate the automatic selection that video is dubbed in background music.
Fig. 6 is referred to, it illustrates the processes of one embodiment of the method provided by the present application for pushed information
600.The method for being used for pushed information may comprise steps of:
Step 601, in response to receiving target video, the frame input video Feature Selection Model in target video obtains
To the target signature information of target video.
In the present embodiment, for the executing subject of pushed information (such as server shown in FIG. 1 105, or be stored with
Other servers of video feature extraction model) in response to receiving target video, it can be defeated by the frame in above-mentioned target video
Enter video feature extraction model, obtains the target signature information of above-mentioned target video.Herein, target video can be terminal device
The video not yet dubbed in background music uploaded.
In the present embodiment, video feature extraction model can be using the method as described in above-mentioned Fig. 2 embodiment and
It generates.Specific generating process may refer to the associated description of Fig. 2 embodiment, and details are not described herein again.
Step 602, the characteristic information of the video in target signature information and preset video library is subjected to similarity calculation,
According to the sequence of similarity from big to small, the video of preset quantity is chosen from video library as candidate video.
In the present embodiment, above-mentioned executing subject can be with the video in above-mentioned target signature information and preset video library
Characteristic information carries out similarity calculation, and according to the sequence of similarity from big to small, preset quantity (example is chosen from above-mentioned video library
Such as 5) video as candidate video.
Step 603, the soundtrack information for obtaining candidate video pushes selected soundtrack information.
In the present embodiment, the soundtrack information of the available above-mentioned candidate video of above-mentioned executing subject.Herein, soundtrack information
It can include but is not limited at least one of following: the audio file dubbed in background music, the title dubbed in background music, the style name dubbed in background music.Finally, can
To push selected soundtrack information, such as above-mentioned terminal device is pushed to, for selection by the user.
It is generated it should be noted that the present embodiment can be used for testing the various embodiments described above for the method for pushed information
Video feature extraction model.And then video feature extraction model can constantly be optimized according to test result.This method can also
To be the practical application methods of the various embodiments described above video feature extraction model generated.It is generated using the various embodiments described above
Video feature extraction model, to extract the target signature information of target video, feature based on extracted target video letter
Breath chooses soundtrack information, can carry out recommendation of dubbing in background music to the video that do not dub in background music, and realizes and is imbued with targetedly information push.
With continued reference to Fig. 7, as the realization to method shown in above-mentioned Fig. 6, this application provides one kind to be used for pushed information
Device one embodiment.The Installation practice is corresponding with embodiment of the method shown in fig. 6, which can specifically apply
In various electronic equipments.
As shown in fig. 7, the device 700 described in the present embodiment for pushed information includes: receiving unit 701, it is configured
At in response to receiving target video, the frame input in above-mentioned target video is used into the side as described in above-mentioned Fig. 2 embodiment
The video feature extraction model that method generates, obtains the target signature information of above-mentioned target video;Selection unit 702, is configured to
The characteristic information of video in above-mentioned target signature information and preset video library is subjected to similarity calculation, according to similarity from
Small sequence is arrived greatly, and the video of preset quantity is chosen from above-mentioned video library as candidate video;Push unit 703, is configured
At the soundtrack information for obtaining above-mentioned candidate video, selected soundtrack information is pushed.
It is understood that all units recorded in the device 700 and each step phase in the method with reference to Fig. 6 description
It is corresponding.As a result, above with respect to the operation of method description, the beneficial effect of feature and generation be equally applicable to device 700 and its
In include unit, details are not described herein.
Below with reference to Fig. 8, it illustrates the computer systems 800 for the electronic equipment for being suitable for being used to realize the embodiment of the present application
Structural schematic diagram.Electronic equipment shown in Fig. 8 is only an example, function to the embodiment of the present application and should not use model
Shroud carrys out any restrictions.
As shown in figure 8, computer system 800 includes central processing unit (CPU) 801, it can be read-only according to being stored in
Program in memory (ROM) 802 or be loaded into the program in random access storage device (RAM) 803 from storage section 808 and
Execute various movements appropriate and processing.In RAM 803, also it is stored with system 800 and operates required various programs and data.
CPU 801, ROM 802 and RAM 803 are connected with each other by bus 804.Input/output (I/O) interface 805 is also connected to always
Line 804.
I/O interface 805 is connected to lower component: the importation 806 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 807 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 808 including hard disk etc.;
And the communications portion 809 of the network interface card including LAN card, modem etc..Communications portion 809 via such as because
The network of spy's net executes communication process.Driver 810 is also connected to I/O interface 805 as needed.Detachable media 811, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 810, in order to read from thereon
Computer program be mounted into storage section 808 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, which includes the program code for method shown in execution flow chart.In such reality
It applies in example, which can be downloaded and installed from network by communications portion 809, and/or from detachable media
811 are mounted.When the computer program is executed by central processing unit (CPU) 801, limited in execution the present processes
Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or
Computer readable storage medium either the two any combination.Computer readable storage medium for example can be --- but
Be not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.
The more specific example of computer readable storage medium can include but is not limited to: have one or more conducting wires electrical connection,
Portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only deposit
Reservoir (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory
Part or above-mentioned any appropriate combination.In this application, computer readable storage medium, which can be, any include or stores
The tangible medium of program, the program can be commanded execution system, device or device use or in connection.And
In the application, computer-readable signal media may include in a base band or the data as the propagation of carrier wave a part are believed
Number, wherein carrying computer-readable program code.The data-signal of this propagation can take various forms, including but not
It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer
Any computer-readable medium other than readable storage medium storing program for executing, the computer-readable medium can send, propagate or transmit use
In by the use of instruction execution system, device or device or program in connection.Include on computer-readable medium
Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc., Huo Zheshang
Any appropriate combination stated.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use
The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box
The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually
It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse
Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction
Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard
The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet
Include acquiring unit and training unit.Wherein, the title of these units does not constitute the limit to the unit itself under certain conditions
It is fixed, for example, acquiring unit is also described as " obtaining the unit of sample set ".
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be
Included in device described in above-described embodiment;It is also possible to individualism, and without in the supplying device.Above-mentioned calculating
Machine readable medium carries one or more program, when said one or multiple programs are executed by the device, so that should
Device: sample set is obtained;Sample is extracted from the sample set, executes following training step: by the video in extracted sample
In frame be input to initial model, obtain the characteristic information of each video in sample;Based on obtained characteristic information and sample
In dub in background music style mark, determine the penalty values of sample;Determine whether initial model trains completion based on the penalty values;In response to
It determines that initial model training is completed, the initial model after training is determined as video feature extraction model.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art
Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature
Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein
Can technical characteristic replaced mutually and the technical solution that is formed.