CN109492128A

CN109492128A - Method and apparatus for generating model

Info

Publication number: CN109492128A
Application number: CN201811273701.7A
Authority: CN
Inventors: 袁泽寰; 癿春光; 王长虎
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2018-10-30
Filing date: 2018-10-30
Publication date: 2019-03-19
Anticipated expiration: 2038-10-30
Also published as: WO2020087979A1; CN109492128B

Abstract

The embodiment of the present application discloses the method and apparatus for generating model.One specific embodiment of this method includes: acquisition sample set；Sample is extracted from the sample set, executes following training step: the frame in the video in extracted sample is input to initial model, obtains the characteristic information of each video in sample；Based on the style mark of dubbing in background music in obtained characteristic information and sample, the penalty values of sample are determined；Determine whether initial model trains completion based on the penalty values；In response to determining that initial model training is completed, the initial model after training is determined as video feature extraction model.The embodiment can obtain a kind of model that can be used for extracting video features, and the extracted video features of the model facilitate the automatic selection that video is dubbed in background music.

Description

Method and apparatus for generating model

Technical field

The invention relates to field of computer technology, and in particular to the method and apparatus for generating model.

Background technique

With the development of computer technology, short video class application is come into being.User can use short video class using upper It passes, publication video.When using short Video Applications uploaded videos, it usually needs user selects a music to dub in background music as video.

Existing mode, usually user are chosen from list of dubbing in background music dub in background music manually.

Summary of the invention

The embodiment of the present application proposes the method and apparatus for generating model.

In a first aspect, the embodiment of the present application provides a kind of method for generating model, this method comprises: obtaining sample Collection, wherein sample in sample set includes the first video, the second video and third video, and the first video and the second video are matched Happy identical and with identical style mark of dubbing in background music, the first video and third video dub in background music different and have different wind of dubbing in background music Case marker note；Sample is extracted from sample set, executes following training step: the frame in the video in extracted sample is input to Initial model obtains the characteristic information of each video in sample；Based on the style of dubbing in background music in obtained characteristic information and sample Mark, determines the penalty values of sample；Determine whether initial model trains completion based on penalty values；In response to determining initial model instruction Practice and complete, the initial model after training is determined as video feature extraction model.

In some embodiments, based on the style mark of dubbing in background music in obtained characteristic information and sample, sample is determined Penalty values, comprising: determine the first Euclidean distance and the second Euclidean distance, wherein the first Euclidean distance is to dub in background music with identical The Euclidean distance of the characteristic information of the video of style mark, the second Euclidean distance are the video with different style marks of dubbing in background music Characteristic information Euclidean distance；Determine the difference of the second Euclidean distance Yu the first Euclidean distance；It is default with first based on difference The comparison of numerical value determines the penalty values of sample, wherein the first default value is positive number.

In some embodiments, based on difference compared with the first default value, the penalty values of sample are determined, comprising: ring The first default value should be greater than in determining difference, the second default value is determined as to the penalty values of sample, wherein the second present count Value is less than the difference of difference and the first default value.

In some embodiments, based on difference compared with the first default value, the penalty values of sample are determined, comprising: ring , no more than the first default value, the difference of the first default value and difference should be determined as to the penalty values of sample in determining difference.

In some embodiments, the sample in sample set generates as follows: mentioning at random from preset video library Take video as the first video, wherein the video in video library is with mark and the style mark of dubbing in background music of dubbing in background music；From video library with Machine extract with the first video it is having the same dub in background music mark and it is having the same dub in background music style mark video as the second video； The video marked from the first video with different dubbing in background music and with different style marks of dubbing in background music is randomly selected from video library As third video；It is sample by the first video, the second video, third video summary.

In some embodiments, this method further include: in response to determining that initial model not complete by training, is based on penalty values, The parameter in initial model is updated, extracts sample again from sample set, the initial model after using undated parameter is as initially Model continues to execute training step.

Second aspect, the embodiment of the present application provide it is a kind of for generating the device of model, the device include: obtain it is single Member is configured to obtain sample set, wherein and sample in sample set includes the first video, the second video and third video, and first Dubbing in background music for video and the second video is identical and with identical style mark of dubbing in background music, the difference of dubbing in background music of the first video and third video And with different style marks of dubbing in background music；Training unit is configured to extract sample from sample set, executes following training step It is rapid: the frame in the video in extracted sample being input to initial model, obtains the characteristic information of each video in sample；Base Style mark of dubbing in background music in obtained characteristic information and sample, determines the penalty values of sample；It is determined based on penalty values initial Whether model trains completion；In response to determining that initial model training is completed, the initial model after training is determined as video features Extract model.

In some embodiments, training unit is further configured to: determine the first Euclidean distance and the second Euclidean distance, Wherein, the first Euclidean distance is the Euclidean distance of the characteristic information of the video with identical style mark of dubbing in background music, the second Euclidean Distance is the Euclidean distance of the characteristic information of the video with different style marks of dubbing in background music；Determine the second Euclidean distance and first The difference of Euclidean distance；Based on difference compared with the first default value, the penalty values of sample are determined, wherein the first present count Value is positive number.

In some embodiments, training unit is further configured to: in response to determining that difference is greater than the first default value, Second default value is determined as to the penalty values of sample, wherein the second default value is less than the difference of difference and the first default value.

In some embodiments, training unit is further configured to: in response to determining that difference is not more than the first present count The difference of first default value and difference, is determined as the penalty values of sample by value.

In some embodiments, device further include: updating unit is configured in response to determine that initial model is not trained It completes, is based on penalty values, update the parameter in initial model, sample is extracted again from sample set, after undated parameter Initial model continues to execute training step as initial model.

The third aspect, the embodiment of the present application provide a kind of method for pushed information, comprising: in response to receiving mesh Video is marked, the view that the frame input in target video is generated using the method as described in any embodiment in above-mentioned first aspect Frequency Feature Selection Model obtains the target signature information of target video；By the view in target signature information and preset video library The characteristic information of frequency carries out similarity calculation, and according to the sequence of similarity from big to small, preset quantity is chosen from video library Video is as candidate video；The soundtrack information for obtaining candidate video, selected soundtrack information is pushed.

Fourth aspect, the embodiment of the present application provide a kind of device for pushed information, comprising: receiving unit is matched It is set in response to receiving target video, by the frame input in target video using such as any embodiment institute in above-mentioned first aspect The video feature extraction model that the method for description generates, obtains the target signature information of target video；Selection unit is configured to The characteristic information of video in target signature information and preset video library is subjected to similarity calculation, according to similarity from greatly to Small sequence chooses the video of preset quantity as candidate video from video library；Push unit is configured to obtain candidate view The soundtrack information of frequency pushes selected soundtrack information.

5th aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: one or more processors；Storage dress Set, be stored thereon with one or more programs, when one or more programs are executed by one or more processors so that one or Multiple processors realize the method such as any embodiment in above-mentioned first aspect and the third aspect.

6th aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program, should The method such as any embodiment in above-mentioned first aspect and the third aspect is realized when program is executed by processor.

Method and apparatus provided by the embodiments of the present application for generating model can be mentioned therefrom by obtaining sample set This is sampled to carry out the training of initial model.Wherein, the sample in sample set may include the first video, the second video and third Video.Dubbing in background music for first video and the second video is identical and with identical style mark of dubbing in background music.First video and third video Dub in background music it is different and with different style marks of dubbing in background music.In this way, the frame in the video in the sample of extraction is input to initially Model can obtain the characteristic information of each video in sample.It later, can be based in obtained characteristic information and sample Dub in background music style mark, determine the penalty values of sample.Finally, can determine whether initial model instructs based on identified penalty values Practice and completes.If initial model training is completed, so that it may which the initial model after training is determined as video feature extraction model.From And a kind of model that can be used for extracting video features can be obtained, and the extracted video features of the model facilitate video and match Happy automatic selection.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that one embodiment of the application can be applied to exemplary system architecture figure therein；

Fig. 2 is the flow chart according to one embodiment of the method for generating model of the application；

Fig. 3 is the schematic diagram according to an application scenarios of the method for generating model of the application；

Fig. 4 is the flow chart according to another embodiment of the method for generating model of the application；

Fig. 5 is the structural schematic diagram according to one embodiment of the device for generating model of the application；

Fig. 6 is the flow chart according to one embodiment of the method for pushed information of the application；

Fig. 7 is the structural schematic diagram according to one embodiment of the device for pushed information of the application；

Fig. 8 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present application.

Specific embodiment

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 is shown can be using the application for generating the method for model or the example of the device for generating model Property system architecture 100.

As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications can be installed, such as video record class is answered on terminal device 101,102,103 With the application of, video playback class, the application of interactive voice class, searching class application, instant messaging tools, mailbox client, social platform Software etc..

Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard When part, it can be the various electronic equipments with display screen, including but not limited to smart phone, tablet computer, on knee portable Computer and desktop computer etc..When terminal device 101,102,103 is software, above-mentioned cited electricity may be mounted at In sub- equipment.Multiple softwares or software module (such as providing Distributed Services) may be implemented into it, also may be implemented into Single software or software module.It is not specifically limited herein.

When terminal device 101,102,103 is hardware, it is also equipped with image capture device thereon.Image Acquisition is set It is standby to can be the various equipment for being able to achieve acquisition image function, such as camera, sensor.User can use terminal device 101, the image capture device on 102,103, to acquire video.

Server 105 can be to provide the server of various services, such as uploading to terminal device 101,102,103 The video video processing service device that is stored, managed or analyzed.The available sample set of video processing service device.Sample Concentration may include a large amount of sample.Wherein, the sample in above-mentioned sample set may include the first video, the second video and third Video.Dubbing in background music for first video and the second video is identical and with identical style mark of dubbing in background music.First video and third video Dub in background music it is different and with different style marks of dubbing in background music.In addition, video processing service device can use the sample in sample set, Initial model is trained, and training result (such as the video feature extraction model generated) can be stored.In this way, After user utilizes 101,102,103 uploaded videos of terminal device, server 105 can determine the feature for the video that user is uploaded Information can carry out the operations such as selection and the push of soundtrack information in turn to the video.

It should be noted that server 105 can be hardware, it is also possible to software.When server is hardware, Ke Yishi The distributed server cluster of ready-made multiple server compositions, also may be implemented into individual server.When server is software, Multiple softwares or software module (such as providing Distributed Services) may be implemented into, single software or soft also may be implemented into Part module.It is not specifically limited herein.

It should be noted that the method provided by the embodiment of the present application for generating model is generally held by server 105 Row, correspondingly, the device for generating model is generally positioned in server 105.

It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.

With continued reference to Fig. 2, the process of one embodiment of the method for generating model according to the application is shown 200.The method for being used to generate model, comprising the following steps:

Step 201, sample set is obtained.

It in the present embodiment, can be with for generating the executing subject (such as server 105 shown in FIG. 1) of the method for model Obtain sample set in several ways.For example, executing subject can by wired connection mode or radio connection, from It is obtained in another server (such as database server) of storage sample and is stored in existing sample set therein.Example again Such as, user can collect sample by terminal device (such as terminal device shown in FIG. 1 101,102,103).In this way, above-mentioned Executing subject can receive sample collected by terminal, and these samples are stored in local, to generate sample set.It needs to refer to Out, above-mentioned radio connection can include but is not limited to 3G/4G connection, WiFi connection, bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection and other currently known or exploitation in the future radio connections.

It herein, may include a large amount of sample in sample set.Wherein, sample may include the first video, the second video and Third video.It should be noted that dubbing in background music for the first video and the second video is identical and with identical style mark of dubbing in background music.The Dubbing in background music for one video and third video is different and with different style marks of dubbing in background music.Style of dubbing in background music mark, which can be, to be used to indicate With the information for distinguishing the style dubbed in background music.Style of dubbing in background music can be divided into advance it is a variety of such as sad, cheerful and light-hearted, releive.

In some optional implementations of the present embodiment, the sample in above-mentioned sample set can give birth to as follows At: it is possible, firstly, to extract video at random from preset video library as the first video, wherein the video in above-mentioned video library With mark and the style mark of dubbing in background music of dubbing in background music.In practice, mark of dubbing in background music be can serve to indicate that and differentiation is dubbed in background music.For example, mark of dubbing in background music Note can be the title dubbed in background music.Then, it can be extracted at random from above-mentioned video library and above-mentioned first video is having the same matches The video of happy mark and style mark having the same of dubbing in background music is as the second video.It then, can be random from above-mentioned video library The video marked from above-mentioned first video with different dubbing in background music and with different style marks of dubbing in background music is chosen to regard as third Frequently.Finally, can be sample by above-mentioned first video, above-mentioned second video, above-mentioned third video summary.

It should be noted that the sample in above-mentioned sample can also be generated using other modes, such as the side such as artificial selection Formula, details are not described herein again.

Step 202, sample is extracted from sample set.

In the present embodiment, sample is extracted in the sample set that executing subject can be obtained from step 201, and executes step Rapid 203 to step 206 training step.Wherein, the extracting mode of sample and extraction quantity are not intended to limit in this application.Example Such as, it can be and extract at least one sample at random, be also possible to therefrom to extract the clarity of the video of sample preferably (i.e. in sample Video frame pixel it is higher) sample.

Step 203, the frame in the video in extracted sample is input to initial model, obtains each video in sample Characteristic information.

In the present embodiment, above-mentioned executing subject can input the frame in the video in the sample extracted in step 202 To initial model.Due to including the first video, the second video, third video in extracted sample, therefore, it is possible to respectively obtain The characteristic information of first video, the characteristic information of the second video, third video characteristic information.Initial model can be by view Frame in frequency is analyzed etc., and the characteristic information of video is exported.In practice, characteristic information can be in the form of vector or matrix To indicate.

It should be noted that the frame in the video inputted, can be the frame randomly selected or multiframe；Be also possible to by The multiframe extracted from video according to specified time interval (such as 1s or 2s etc.).It is not construed as limiting herein.

In the present embodiment, initial model can be created based on machine learning techniques various and mention with characteristics of image Take the model of function.Initial model can to the frame in video carry out feature extraction, then to the feature of extracted each frame into The processing such as row fusion, analysis, the characteristic information of final output video.

As an example, initial model can be using various existing structures (such as DenseBox, VGGNet, ResNet, SegNet etc.) convolutional neural networks.In practice, convolutional neural networks (Convolutional Neural Network, CNN) It is a kind of feedforward neural network, its artificial neuron can respond the surrounding cells in a part of coverage area, for image Processing has outstanding performance, therefore, it is possible to carry out the extraction of the feature of the frame in video using convolutional neural networks.

In this example, the product neural network established may include convolutional layer, pond layer, Fusion Features layer, full connection Layer etc..Wherein, convolutional layer can be used for extracting characteristics of image.Pond layer can be used for carrying out the information of input down-sampled (downsample).Fusion Features layer can be used for the corresponding characteristics of image of obtained each frame (for example, it may be feature square The form of battle array or the form of feature vector) it is merged.For example, can will be identical in the corresponding eigenmatrix of different frame The characteristic value of position is averaged, and to carry out Fusion Features, generates a fused eigenmatrix.Full articulamentum can be used for pair Fused feature is further processed.

It should be noted that above-mentioned initial model is also possible to other models with image characteristics extraction function, not It is limited to above-mentioned example, specific model structure is not construed as limiting herein.

Step 204, based on the style mark of dubbing in background music in obtained characteristic information and sample, the penalty values of sample are determined.

In the present embodiment, the target of training initial model is to be allowed to identical and with identical style mark of dubbing in background music from dubbing in background music The difference of the characteristic information extracted in the frame of the video of note is as small as possible, meanwhile, from dubbing in background music different and dub in background music style with difference The difference of the characteristic information extracted in the frame of the video of mark is as big as possible.Thus, it is possible to will be identical and with identical from dubbing in background music Style mark of dubbing in background music video frame in the difference of characteristic information extracted be known as the first difference.Can will from dub in background music it is different and The difference of the characteristic information extracted in the frame of video with different style marks of dubbing in background music is known as the second difference.Above-mentioned executing subject The value of the first difference and the second difference can be determined according to characteristic information first.In practice, due in extracted sample First video and the second video are dubbed in background music identical and are marked with identical style of dubbing in background music, and therefore, the first difference, which can be, to be extracted Sample in the first video characteristic information and the second video characteristic information difference.Due in extracted sample One video is dubbed in background music different and is marked with different styles of dubbing in background music, the difference of dubbing in background music of the second video and third video from third video And with different style marks of dubbing in background music.Therefore, the second difference can be the feature letter of the first video in extracted sample The difference of breath and the characteristic information of third video, alternatively, the feature of the characteristic information and third video that can be the second video is believed The difference of breath.Herein, the difference of characteristic information can use the modes such as Euclidean distance, cosine similarity to determine.Characteristic information More similar, difference is smaller.Characteristic information is more dissimilar, and difference is bigger.

After obtaining the first difference and the second difference, above-mentioned executing subject can be inputted the first difference, the second difference To the loss function pre-established, the penalty values of sample are determined.Loss function is a non-negative real-valued function.Under normal circumstances, The value (penalty values) of loss function is smaller, and the robustness of model is better.Loss function can be arranged according to actual needs.Make For example, above-mentioned loss function can be the function of the difference degree for characterizing the second difference and the first difference, such as ternary Group loss function (triplet loss).

In some optional implementations of the present embodiment, above-mentioned executing subject can be determined as follows and be mentioned The penalty values of the sample taken:

The first step determines the Euclidean distance of the characteristic information of the video with identical style mark of dubbing in background music, by the distance As the first Euclidean distance.And determine the Euclidean distance of the characteristic information of the video of different style marks of dubbing in background music, by this away from From as the second Euclidean distance.

Second step determines the difference of above-mentioned second Euclidean distance Yu above-mentioned first Euclidean distance.

Third step determines the penalty values of sample based on above-mentioned difference compared with the first default value, wherein above-mentioned One default value is positive number (such as 0.2).Herein, the first default value can be technical staff by mass data statistics and based on It calculates and preassigned numerical value.

Optionally, it is greater than above-mentioned first in response to the difference of above-mentioned second Euclidean distance of determination and above-mentioned first Euclidean distance Second default value (such as 0) can be determined as the penalty values of sample by default value, above-mentioned executing subject.It is understood that It is, when the difference of above-mentioned second Euclidean distance and above-mentioned first Euclidean distance is greater than above-mentioned first default value, it is believed that The numerical relation of second Euclidean distance and the first Euclidean distance, which meets, is expected.At this point, penalty values can be set to lesser number, make The influence that the penalty values of the sample decline gradient is smaller or is not involved in gradient decline.Due to the second Euclidean distance at this time with it is above-mentioned The difference of first Euclidean distance is greater than above-mentioned first default value, thus the difference and the difference of the first default value (can claim herein For target value) it is greater than 0, therefore, above-mentioned second default value can be set smaller than to the number of above-mentioned target value, such as 0.

Optionally, in response to above-mentioned second Euclidean distance of determination and above-mentioned first Euclidean distance difference no more than above-mentioned the The difference of above-mentioned first default value and above-mentioned difference can be determined as the loss of sample by one default value, above-mentioned executing subject Value.It is understood that when the difference of above-mentioned second Euclidean distance and above-mentioned first Euclidean distance is not more than the first default value When, it is believed that the numerical relation of the second Euclidean distance and the first Euclidean distance is unsatisfactory for being expected.At this point, due to the first present count The difference of value and the difference is greater than zero, therefore, the difference of above-mentioned first default value and above-mentioned difference can be determined as to the damage of sample Mistake value participates in gradient decline.

Step 205, determine whether initial model trains completion based on penalty values.

In the present embodiment, above-mentioned executing subject penalty values based on determined by step 204 determine that initial model is No training is completed.As an example, above-mentioned executing subject can determine whether penalty values have restrained.When determining penalty values convergence, It can then determine that initial model at this time has trained completion.As another example, above-mentioned executing subject can be first by penalty values It is compared with target value.In response to determining that penalty values are less than or equal to target value, the preset quantity executed recently can be counted In penalty values determined by (such as 100) secondary training step, less than or equal to the quantity institute accounting of the penalty values of above-mentioned target value Example.When the ratio is greater than preset ratio (such as 95%), it can determine that initial model training is completed.It should be noted that mesh Scale value can be generally used for indicating the ideal situation of the inconsistent degree between predicted value and true value.That is, when loss When value is less than or equal to target value, it is believed that predicted value nearly or approximately true value.Target value can be come according to actual needs Setting.

It should be noted that can then continue to execute step 206 in response to determining that initial model has trained completion.Response In determining that initial model not complete by training, the parameter in initial model can be updated, from above-mentioned sample based on identified penalty values This concentration extracts sample again, and the initial model after using undated parameter continues to execute above-mentioned training step as initial model. Herein, it can use the gradient that back-propagation algorithm acquires penalty values relative to model parameter, then utilize gradient descent algorithm Based on gradient updating model parameter.It should be noted that above-mentioned back-propagation algorithm, gradient descent algorithm and machine learning side Method is the well-known technique studied and applied extensively at present, and details are not described herein.It should be pointed out that extracting mode here is at this It is not also limited in application.Such as in the case where sample is concentrated with great amount of samples, executing subject can be extracted therefrom and is not extracted by The sample crossed.

Step 206, in response to determining that initial model training is completed, the initial model after training is determined as video features and is mentioned Modulus type.

In the present embodiment, in response to determine initial model training complete, above-mentioned executing subject can will after training at the beginning of Beginning model is determined as video feature extraction model.

In some optional implementations of the present embodiment, after training obtains video feature extraction model, response In receiving target video, the frame in above-mentioned target video is input to the video feature extraction model, obtains above-mentioned target view The target signature information of frequency；It then, can be by the characteristic information of the video in above-mentioned target signature information and preset video library Similarity calculation is carried out, according to the sequence of similarity from big to small, the video conduct of preset quantity is chosen from above-mentioned video library Candidate video；Finally, the soundtrack information of available above-mentioned candidate video, selected soundtrack information is pushed.

With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for generating model of the present embodiment Figure.In the application scenarios of Fig. 3, in the application scenarios of Fig. 3, model can be installed on terminal device 301 used by a user Training class application.When user opens the application, and after uploading the store path of sample set or sample set, backstage is provided to the application The server 302 of support can run the method for generating model, comprising:

It is possible, firstly, to obtain sample set.Wherein, the sample in above-mentioned sample set includes the first video, the second video and the Three videos, the first video and the second video are dubbed in background music identical and are marked with identical style of dubbing in background music, and the first video and third regard Frequency is dubbed in background music different and is marked with different styles of dubbing in background music.Later, sample can be extracted from above-mentioned sample set, executed as follows Training step: pumping frame is carried out to the video in extracted sample.By frame 303, the second video in the first video extracted In frame 304, the frame 305 in third video be input to initial model 306, obtain the characteristic information of each video in sample.Base Style mark 307 of dubbing in background music in obtained characteristic information and sample, determines the penalty values 308 of sample.Based on above-mentioned loss Value determines whether initial model trains completion.In response to determining that initial model training is completed, the initial model after training is determined For video feature extraction model 309.

By obtaining sample set, sample can be extracted therefrom to carry out the training of initial model.Wherein, in above-mentioned sample set Sample may include the first video, the second video and third video.First video and the second video are dubbed in background music identical and are had Identical style mark of dubbing in background music.Dubbing in background music for first video and third video is different and with different style marks of dubbing in background music.In this way, Frame in video in the sample of extraction is input to initial model, the characteristic information of each video in sample can be obtained. Later, the penalty values of sample can be determined based on the style mark of dubbing in background music in obtained characteristic information and sample.Finally, can To determine whether initial model trains completion based on identified penalty values.If initial model training is completed, so that it may will instruct Initial model after white silk is determined as video feature extraction model.So as to obtain a kind of mould that can be used for extracting video features Type, and the extracted video features of the model facilitate the automatic selection that video is dubbed in background music.

With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of the method for generating model.The use In the process 400 for the method for generating model, comprising the following steps:

Step 401, sample set is obtained.

It in the present embodiment, can be with for generating the executing subject (such as server 105 shown in FIG. 1) of the method for model Obtain sample set.It may include a large amount of sample in sample set.Wherein, sample may include the first video, the second video and Three videos.It should be noted that dubbing in background music for the first video and the second video is identical and with identical style mark of dubbing in background music.First Dubbing in background music for video and third video is different and with different style marks of dubbing in background music.Style of dubbing in background music mark can be used to indicate and Distinguish the information for the style dubbed in background music.Style of dubbing in background music can be divided into advance it is a variety of such as sad, cheerful and light-hearted, releive.

In the present embodiment, the sample in above-mentioned sample set can generate as follows: it is possible, firstly, to from preset Video is extracted in video library at random as the first video, wherein the video in above-mentioned video library is with mark and the wind of dubbing in background music of dubbing in background music Case marker note, style of dubbing in background music are noted for the style that instruction is dubbed in background music.Then, it can be extracted at random from above-mentioned video library and above-mentioned the One video is having the same dub in background music mark and style mark having the same of dubbing in background music video as the second video.It then, can be with It is randomly selected from above-mentioned video library and is marked from above-mentioned first video with different dubbing in background music and there is different style marks of dubbing in background music The video of note is as third video.Finally, can be by above-mentioned first video, above-mentioned second video, above-mentioned third video summary Sample.

Step 402, sample is extracted from sample set.

In the present embodiment, sample is extracted in the sample set that executing subject can be obtained from step 401, and executes step Rapid 403 to step 408 training step.Wherein, the extracting mode of sample and extraction quantity are not intended to limit in this application.Example Such as, it can be and extract at least one sample at random, be also possible to therefrom to extract the clarity of the video of sample preferably (i.e. in sample Video frame pixel it is higher) sample.

Step 403, the frame in the video in extracted sample is input to initial model, obtains each video in sample Characteristic information.

In the present embodiment, above-mentioned executing subject can input the frame in the video in the sample extracted in step 402 To initial model, the characteristic information of the characteristic information of the first video, the characteristic information of the second video, third video is respectively obtained. In practice, characteristic information can be indicated in the form of vector or matrix.

In the present embodiment, initial model can be the convolutional neural networks created based on machine learning techniques.Initially Model can carry out feature extraction to the frame in video, then the processing such as be merged, analyzed to the feature of extracted each frame, The characteristic information of final output video.Herein, the product neural network established may include convolutional layer, pond layer, Fusion Features Layer, full articulamentum etc..

Step 404, the first Euclidean distance and the second Euclidean distance are determined.

In the present embodiment, above-mentioned executing subject can determine the first Euclidean distance and the second Euclidean distance.Wherein, above-mentioned First Euclidean distance be with it is identical dub in background music style mark video characteristic information Euclidean distance, above-mentioned second Euclidean away from Euclidean distance from the characteristic information for the video with different style marks of dubbing in background music.

Herein, dubbing in background music identical and dub in background music with identical due to the first video and the second video in extracted sample Style mark, therefore, the first difference can be the characteristic information of the first video in extracted sample and the spy of the second video The difference of reference breath.Dubbing in background music different and dub in background music with different due to the first video and the third video in extracted sample Style mark, the second video is from third video with different style marks of dubbing in background music.Therefore, the second difference can be extracted The difference of the characteristic information of the characteristic information and third video of the first video in sample, alternatively, the characteristic information of the second video With the difference of the characteristic information of third video.

Step 405, the difference of the second Euclidean distance Yu the first Euclidean distance is determined.

In the present embodiment, above-mentioned executing subject can determine above-mentioned second Euclidean distance and above-mentioned first Euclidean distance Difference.

Step 406, based on above-mentioned difference compared with the first default value, the penalty values of sample are determined.

In the present embodiment, above-mentioned executing subject can determine sample based on above-mentioned difference compared with the first default value This penalty values.Wherein, above-mentioned first default value is positive number (such as 0.2).Herein, the first default value can be technology people Member is counted and is calculated and preassigned numerical value based on mass data.

Specifically, it is greater than above-mentioned first in response to the difference of above-mentioned second Euclidean distance of determination and above-mentioned first Euclidean distance Second default value (such as 0) can be determined as the penalty values of sample by default value, above-mentioned executing subject.It is understood that It is, when the difference of above-mentioned second Euclidean distance and above-mentioned first Euclidean distance is greater than above-mentioned first default value, it is believed that The numerical relation of second Euclidean distance and the first Euclidean distance, which meets, is expected.At this point, penalty values can be set to lesser numerical value, The influence for declining the penalty values of the sample to gradient is smaller or is not involved in gradient decline.Due to the second Euclidean distance at this time with it is upper The difference for stating the first Euclidean distance is greater than above-mentioned first default value, thus the difference and the difference of the first default value (herein may be used Referred to as target value) it is greater than 0, therefore, above-mentioned second default value can be set smaller than to the number of above-mentioned target value, such as 0.

It is default no more than above-mentioned first in response to the difference of above-mentioned second Euclidean distance of determination and above-mentioned first Euclidean distance The difference of above-mentioned first default value and above-mentioned difference can be determined as the penalty values of sample by numerical value, above-mentioned executing subject.It can be with Understand, it, can be with when the difference of above-mentioned second Euclidean distance and above-mentioned first Euclidean distance is not more than the first default value Think that the numerical relation of the second Euclidean distance and the first Euclidean distance is unsatisfactory for being expected.At this point, due to the first default value and being somebody's turn to do The difference of difference is greater than zero, therefore, the difference of above-mentioned first default value and above-mentioned difference can be determined as to the penalty values of sample, ginseng Decline with gradient.

Step 407, determine whether initial model trains completion based on penalty values.

In the present embodiment, above-mentioned executing subject penalty values based on determined by step 406 determine that initial model is No training is completed.As an example, above-mentioned executing subject can determine whether penalty values have restrained.When determining penalty values convergence, It can then determine that initial model at this time has trained completion.

It should be noted that can then continue to execute step 408 in response to determining that initial model has trained completion.Response In determining that initial model not complete by training, the parameter in initial model can be updated, from above-mentioned sample based on identified penalty values This concentration extracts sample again, and the initial model after using undated parameter continues to execute above-mentioned training step as initial model. Herein, it can use the gradient that back-propagation algorithm acquires penalty values relative to model parameter, then utilize gradient descent algorithm Based on gradient updating model parameter.It should be noted that above-mentioned back-propagation algorithm, gradient descent algorithm and machine learning side Method is the well-known technique studied and applied extensively at present, and details are not described herein.It should be pointed out that extracting mode here is at this It is not also limited in application.Such as in the case where sample is concentrated with great amount of samples, executing subject can be extracted therefrom and is not extracted by The sample crossed.

Step 408, in response to determining that initial model training is completed, the initial model after training is determined as video features and is mentioned Modulus type.

Figure 4, it is seen that the method for generating model compared with the corresponding embodiment of Fig. 2, in the present embodiment Process 400 relate to determine extracted sample penalty values a kind of mode.The scheme of the present embodiment description can be with as a result, The difference for the characteristic information for extracting model from the frame for dubbing in background music video identical and with identical style mark of dubbing in background music to the greatest extent may be used Can be small, meanwhile, the difference of the characteristic information extracted from the frame for dubbing in background music different and style mark of dubbing in background music with difference video is most It may be big.Thereby, it is possible to obtain a kind of model that can be used for extracting video features, and the extracted video features of the model have Help the automatic selection that video is dubbed in background music.

With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind for generating mould One embodiment of the device of type, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer For in various electronic equipments.

As shown in figure 5, being used to generate the device 500 of model described in the present embodiment includes: acquiring unit 501, it is configured At acquisition sample set, wherein the sample in above-mentioned sample set includes the first video, the second video and third video, the first video With the second video dub in background music it is identical and with it is identical dub in background music style mark, the different and band of dubbing in background music of the first video and third video There is different style marks of dubbing in background music；Training unit 502 is configured to extract sample from above-mentioned sample set, executes following training Step: being input to initial model for the frame in the video in extracted sample, obtains the characteristic information of each video in sample； Based on the style mark of dubbing in background music in obtained characteristic information and sample, the penalty values of sample are determined；It is true based on above-mentioned penalty values Determine whether initial model trains completion；In response to determining that initial model training is completed, the initial model after training is determined as regarding Frequency Feature Selection Model.

In some optional implementations of the present embodiment, above-mentioned training unit 502 can be further configured to really Fixed first Euclidean distance and the second Euclidean distance, wherein above-mentioned first Euclidean distance is with identical style mark of dubbing in background music The Euclidean distance of the characteristic information of video, above-mentioned second Euclidean distance are the feature of the video with different style marks of dubbing in background music The Euclidean distance of information；Determine the difference of above-mentioned second Euclidean distance Yu above-mentioned first Euclidean distance；Based on above-mentioned difference and the The comparison of one default value determines the penalty values of sample, wherein above-mentioned first default value is positive number.

In some optional implementations of the present embodiment, above-mentioned training unit 502 can be further configured to: be rung The first default value should be greater than in determining above-mentioned difference, the second default value be determined as to the penalty values of sample, wherein above-mentioned the Two default values are less than the difference of above-mentioned difference and above-mentioned first default value.

In some optional implementations of the present embodiment, above-mentioned training unit 502 can be further configured to: be rung , no more than the first default value, the difference of above-mentioned first default value and above-mentioned difference should be determined as sample in determining above-mentioned difference Penalty values.

In some optional implementations of the present embodiment, the sample in above-mentioned sample set generates as follows: Video is extracted at random from preset video library as the first video, wherein the video in above-mentioned video library is with mark of dubbing in background music With style mark of dubbing in background music, style of dubbing in background music is noted for the style that instruction is dubbed in background music；It is extracted at random and above-mentioned the from above-mentioned video library One video is having the same dub in background music mark and style mark having the same of dubbing in background music video as the second video；From above-mentioned video The video marked from above-mentioned first video with different dubbing in background music and with different style marks of dubbing in background music is randomly selected in library to make For third video；It is sample by above-mentioned first video, above-mentioned second video, above-mentioned third video summary.

In some optional implementations of the present embodiment, which can also include that updating unit (does not show in figure Out).Wherein, above-mentioned updating unit may be configured in response to determining that initial model not complete by training, is based on above-mentioned penalty values, Update initial model in parameter, extract sample again from above-mentioned sample set, the initial model after using undated parameter as Initial model continues to execute above-mentioned training step.

The device provided by the above embodiment of the application obtains sample set by acquiring unit 501, can therefrom extract sample This is to carry out the training of initial model.Wherein, the sample in above-mentioned sample set may include the first video, the second video and third Video.Dubbing in background music for first video and the second video is identical and with identical style mark of dubbing in background music.First video and third video Dub in background music it is different and with different style marks of dubbing in background music.In this way, training unit 502 is by the frame in the video in the sample of extraction It is input to initial model, the characteristic information of each video in sample can be obtained.Later, can be believed based on obtained feature Style mark of dubbing in background music in breath and sample, determines the penalty values of sample.Finally, can be determined based on identified penalty values initial Whether model trains completion.If initial model training is completed, so that it may which the initial model after training is determined as video features Extract model.So as to obtain a kind of model that can be used for extracting video features, and the extracted video features of the model Facilitate the automatic selection that video is dubbed in background music.

Fig. 6 is referred to, it illustrates the processes of one embodiment of the method provided by the present application for pushed information 600.The method for being used for pushed information may comprise steps of:

Step 601, in response to receiving target video, the frame input video Feature Selection Model in target video obtains To the target signature information of target video.

In the present embodiment, for the executing subject of pushed information (such as server shown in FIG. 1 105, or be stored with Other servers of video feature extraction model) in response to receiving target video, it can be defeated by the frame in above-mentioned target video Enter video feature extraction model, obtains the target signature information of above-mentioned target video.Herein, target video can be terminal device The video not yet dubbed in background music uploaded.

In the present embodiment, video feature extraction model can be using the method as described in above-mentioned Fig. 2 embodiment and It generates.Specific generating process may refer to the associated description of Fig. 2 embodiment, and details are not described herein again.

Step 602, the characteristic information of the video in target signature information and preset video library is subjected to similarity calculation, According to the sequence of similarity from big to small, the video of preset quantity is chosen from video library as candidate video.

In the present embodiment, above-mentioned executing subject can be with the video in above-mentioned target signature information and preset video library Characteristic information carries out similarity calculation, and according to the sequence of similarity from big to small, preset quantity (example is chosen from above-mentioned video library Such as 5) video as candidate video.

Step 603, the soundtrack information for obtaining candidate video pushes selected soundtrack information.

In the present embodiment, the soundtrack information of the available above-mentioned candidate video of above-mentioned executing subject.Herein, soundtrack information It can include but is not limited at least one of following: the audio file dubbed in background music, the title dubbed in background music, the style name dubbed in background music.Finally, can To push selected soundtrack information, such as above-mentioned terminal device is pushed to, for selection by the user.

It is generated it should be noted that the present embodiment can be used for testing the various embodiments described above for the method for pushed information Video feature extraction model.And then video feature extraction model can constantly be optimized according to test result.This method can also To be the practical application methods of the various embodiments described above video feature extraction model generated.It is generated using the various embodiments described above Video feature extraction model, to extract the target signature information of target video, feature based on extracted target video letter Breath chooses soundtrack information, can carry out recommendation of dubbing in background music to the video that do not dub in background music, and realizes and is imbued with targetedly information push.

With continued reference to Fig. 7, as the realization to method shown in above-mentioned Fig. 6, this application provides one kind to be used for pushed information Device one embodiment.The Installation practice is corresponding with embodiment of the method shown in fig. 6, which can specifically apply In various electronic equipments.

As shown in fig. 7, the device 700 described in the present embodiment for pushed information includes: receiving unit 701, it is configured At in response to receiving target video, the frame input in above-mentioned target video is used into the side as described in above-mentioned Fig. 2 embodiment The video feature extraction model that method generates, obtains the target signature information of above-mentioned target video；Selection unit 702, is configured to The characteristic information of video in above-mentioned target signature information and preset video library is subjected to similarity calculation, according to similarity from Small sequence is arrived greatly, and the video of preset quantity is chosen from above-mentioned video library as candidate video；Push unit 703, is configured At the soundtrack information for obtaining above-mentioned candidate video, selected soundtrack information is pushed.

It is understood that all units recorded in the device 700 and each step phase in the method with reference to Fig. 6 description It is corresponding.As a result, above with respect to the operation of method description, the beneficial effect of feature and generation be equally applicable to device 700 and its In include unit, details are not described herein.

Below with reference to Fig. 8, it illustrates the computer systems 800 for the electronic equipment for being suitable for being used to realize the embodiment of the present application Structural schematic diagram.Electronic equipment shown in Fig. 8 is only an example, function to the embodiment of the present application and should not use model Shroud carrys out any restrictions.

As shown in figure 8, computer system 800 includes central processing unit (CPU) 801, it can be read-only according to being stored in Program in memory (ROM) 802 or be loaded into the program in random access storage device (RAM) 803 from storage section 808 and Execute various movements appropriate and processing.In RAM 803, also it is stored with system 800 and operates required various programs and data. CPU 801, ROM 802 and RAM 803 are connected with each other by bus 804.Input/output (I/O) interface 805 is also connected to always Line 804.

I/O interface 805 is connected to lower component: the importation 806 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 807 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 808 including hard disk etc.； And the communications portion 809 of the network interface card including LAN card, modem etc..Communications portion 809 via such as because The network of spy's net executes communication process.Driver 810 is also connected to I/O interface 805 as needed.Detachable media 811, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 810, in order to read from thereon Computer program be mounted into storage section 808 as needed.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 809, and/or from detachable media 811 are mounted.When the computer program is executed by central processing unit (CPU) 801, limited in execution the present processes Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or Computer readable storage medium either the two any combination.Computer readable storage medium for example can be --- but Be not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination. The more specific example of computer readable storage medium can include but is not limited to: have one or more conducting wires electrical connection, Portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only deposit Reservoir (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory Part or above-mentioned any appropriate combination.In this application, computer readable storage medium, which can be, any include or stores The tangible medium of program, the program can be commanded execution system, device or device use or in connection.And In the application, computer-readable signal media may include in a base band or the data as the propagation of carrier wave a part are believed Number, wherein carrying computer-readable program code.The data-signal of this propagation can take various forms, including but not It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer Any computer-readable medium other than readable storage medium storing program for executing, the computer-readable medium can send, propagate or transmit use In by the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc., Huo Zheshang Any appropriate combination stated.

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.

Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include acquiring unit and training unit.Wherein, the title of these units does not constitute the limit to the unit itself under certain conditions It is fixed, for example, acquiring unit is also described as " obtaining the unit of sample set ".

As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in device described in above-described embodiment；It is also possible to individualism, and without in the supplying device.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the device, so that should Device: sample set is obtained；Sample is extracted from the sample set, executes following training step: by the video in extracted sample In frame be input to initial model, obtain the characteristic information of each video in sample；Based on obtained characteristic information and sample In dub in background music style mark, determine the penalty values of sample；Determine whether initial model trains completion based on the penalty values；In response to It determines that initial model training is completed, the initial model after training is determined as video feature extraction model.

Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of method for generating model, comprising:

Obtain sample set, wherein the sample in the sample set includes the first video, the second video and third video, the first view Dubbing in background music for frequency and the second video is identical and with identical style mark of dubbing in background music, the first video and third video dub in background music it is different and With different style marks of dubbing in background music；

Sample is extracted from the sample set, executes following training step: the frame in the video in extracted sample is inputted To initial model, the characteristic information of each video in sample is obtained；Based on the wind of dubbing in background music in obtained characteristic information and sample Case marker note, determines the penalty values of sample；Determine whether initial model trains completion based on the penalty values；It is initial in response to determining Model training is completed, and the initial model after training is determined as video feature extraction model.

2. the method according to claim 1 for generating model, wherein described to be based on obtained characteristic information and sample Style mark of dubbing in background music in this, determines the penalty values of sample, comprising:

Determine the first Euclidean distance and the second Euclidean distance, wherein first Euclidean distance is with identical style of dubbing in background music The Euclidean distance of the characteristic information of the video of mark, second Euclidean distance are the video with different style marks of dubbing in background music Characteristic information Euclidean distance；

Determine the difference of second Euclidean distance Yu first Euclidean distance；

Based on the difference compared with the first default value, the penalty values of sample are determined, wherein first default value is Positive number.

3. the method according to claim 2 for generating model, wherein described to be based on the difference and the first present count The comparison of value determines the penalty values of sample, comprising:

It is greater than the first default value in response to the determination difference, the second default value is determined as to the penalty values of sample, wherein Second default value is less than the difference of the difference and first default value.

4. the method according to claim 2 for generating model, wherein described to be based on the difference and the first present count The comparison of value determines the penalty values of sample, comprising:

It is not more than the first default value in response to the determination difference, the difference of first default value and the difference is determined For the penalty values of sample.

5. the method according to claim 1 for generating model, wherein the sample in the sample set by walking as follows It is rapid to generate:

Video is extracted at random from preset video library as the first video, wherein the video in the video library, which has, dubs in background music The style that marks and dub in background music mark；

It is extracted at random from the video library and first video mark and the wind having the same of dubbing in background music having the same of dubbing in background music The video of case marker note is as the second video；

It is randomly selected from the video library and is marked from first video with different dubbing in background music and there is different wind of dubbing in background music The video of case marker note is as third video；

It is sample by first video, second video, the third video summary.

6. the method according to claim 1 for generating model, wherein the method also includes:

In response to determining that initial model not complete by training, is based on the penalty values, the parameter in initial model is updated, from the sample This concentration extracts sample again, and the initial model after using undated parameter continues to execute the training step as initial model.

7. a kind of for generating the device of model, comprising:

Acquiring unit is configured to obtain sample set, wherein the sample in the sample set includes the first video, the second video With third video, dubbing in background music for the first video and the second video is identical and with identical style mark of dubbing in background music, the first video and the Three videos are dubbed in background music different and are marked with different styles of dubbing in background music；

Training unit is configured to extract sample from the sample set, executes following training step: will be in extracted sample Video in frame be input to initial model, obtain the characteristic information of each video in sample；Based on obtained characteristic information Style mark of dubbing in background music in sample, determines the penalty values of sample；Determine whether initial model has trained based on the penalty values At；In response to determining that initial model training is completed, the initial model after training is determined as video feature extraction model.

8. according to claim 7 for generating the device of model, wherein the training unit is further configured to:

9. according to claim 8 for generating the device of model, wherein the training unit is further configured to:

10. according to claim 8 for generating the device of model, wherein the training unit is further configured to:

11. according to claim 7 for generating the device of model, wherein the sample in the sample set passes through as follows Step generates:

It is sample by first video, second video, the third video summary.

12. according to claim 7 for generating the device of model, wherein described device further include:

Updating unit is configured in response to determine that initial model not complete by training, is based on the penalty values, updates initial model In parameter, extract sample again from the sample set, the initial model after using undated parameter continues as initial model Execute the training step.

13. a kind of method for pushed information, comprising:

In response to receiving target video, by the frame input in the target video using as described in one of claim 1-6 The video feature extraction model that method generates, obtains the target signature information of the target video；

The characteristic information of video in the target signature information and preset video library is subjected to similarity calculation, according to similar The sequence of degree from big to small chooses the video of preset quantity as candidate video from the video library；

The soundtrack information for obtaining the candidate video pushes selected soundtrack information.

14. a kind of device for pushed information, comprising:

Receiving unit is configured in response to receive target video, by the frame input in the target video using such as right It is required that the video feature extraction model that method described in one of 1-6 generates, obtains the target signature information of the target video；

Selection unit is configured to the characteristic information of the target signature information and the video in preset video library carrying out phase It is calculated like degree, according to the sequence of similarity from big to small, the video of preset quantity is chosen from the video library as candidate view Frequently；

Push unit is configured to obtain the soundtrack information of the candidate video, and selected soundtrack information is pushed.

15. a kind of electronic equipment, comprising:

One or more processors；

Storage device is stored thereon with one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now method as described in any in claim 1-6,13.

16. a kind of computer-readable medium, is stored thereon with computer program, wherein the realization when program is executed by processor Method as described in any in claim 1-6,13.