CN110263213A

CN110263213A - Video pushing method, device, computer equipment and storage medium

Info

Publication number: CN110263213A
Application number: CN201910430442.2A
Authority: CN
Inventors: 苏舟; 王良栋; 孙振龙; 张博
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-05-22
Filing date: 2019-05-22
Publication date: 2019-09-20
Anticipated expiration: 2039-05-22
Also published as: CN110263213B

Abstract

The application is about a kind of video pushing method, this method comprises: obtaining N number of candidate cover in the first video；Determine that model obtains the respective forecast confidence of N number of candidate cover by video cover；The video cover determines that model is the convolutional neural networks model according to K candidate cover and the K candidate respective user's operation data progress intensified learning acquisition of cover in the second video；According to the respective forecast confidence of N number of candidate cover, the video cover of the first video is obtained from N number of candidate cover；Terminaloriented, which is sealed, according to the video of the first video pushes the first video.Since video cover determines that model is the convolutional neural networks model that the operation executed according to same video of the user to different covers gives intensive training, user has been comprehensively considered to the selection operation of video cover, improves the accuracy that video cover is determined subsequently through the model trained.

Description

Video pushing method, device, computer equipment and storage medium

Technical field

The invention relates to video applications field, in particular to a kind of video pushing method, device, computer Equipment and storage medium.

Background technique

With the continuous development of the applications of computer network, the video resource in video playback class application program is also increasingly It is more, the video for oneself wanting program request is accurately searched for the ease of user, it is suitable that video provider needs to determine for each video Cover.

In the related art, the server of video provider can select a figure from each picture frame that video includes Video cover as frame as the video.Wherein, above-mentioned cover image frame can be by being that many indexes are respectively set pair in advance The weight answered obtains.For example, developer is pre-designed picture quality disaggregated model, by the high quality picture that marks in advance and Low quality picture carries out model training；After the completion of model training, for each video, pass through picture quality disaggregated model pair Each picture frame in the video is handled, and the picture quality of each picture frame is obtained, will the wherein highest picture frame of picture quality As video cover.

However, scheme shown in the relevant technologies needs manually to be noted for the picture quality of trained picture, train Subjective impact of the accuracy of model by mark personnel, cause to determine by the model trained the accuracy of video cover compared with It is low.

Summary of the invention

The embodiment of the present application provides a kind of video pushing method, device, computer equipment and storage medium, can be improved For improving the accuracy for determining video cover subsequently through the model trained, technical solution is as follows:

On the one hand, a kind of video pushing method is provided, which comprises

N number of candidate cover in the first video is obtained, N is the integer more than or equal to 2；

It determines that model is respectively handled N number of candidate cover by video cover, obtains N number of candidate cover Respective forecast confidence, the forecast confidence are used to indicate the probability that corresponding candidate cover is video cover；The view Frequency cover determines that model is according to the K candidate cover and the respective user's operation of the K candidate cover in the second video The convolutional neural networks model of data progress intensified learning acquisition；The user's operation data are used to indicate second video and connect The user's operation received, and, the corresponding candidate cover of the user's operation, K is the integer more than or equal to 2；

According to the respective forecast confidence of N number of candidate cover, first view is obtained from N number of candidate cover The video cover of frequency；

Terminaloriented, which is sealed, according to the video of first video pushes first video.

On the other hand, it provides a kind of for determining the training method of the model of video cover, which comprises

K candidate cover in the second video is obtained, K is the integer more than or equal to 2；

Pass through K described in the convolutional neural networks model extraction respective characteristics of image of candidate cover；Described image is characterized in The output of feature extraction component in the convolutional neural networks；

Respectively using described K candidate cover as the video cover of second video, second video is pushed away It send, obtains the described K respective user's operation data of candidate cover；The user's operation data are used to indicate second video The user's operation received, and, the corresponding candidate cover of the user's operation；

According to the described K candidate respective characteristics of image of cover and the respective user's operation number of the K candidate cover According to the network parameter progress intensified learning of the confidence level output precision in the convolutional neural networks model；The confidence level The characteristics of image that output precision is used to be extracted according to the feature extraction component exports forecast confidence, and the forecast confidence is used In the probability that the corresponding candidate cover of instruction is video cover；

When the convergence of the output result of the confidence level output precision, the convolutional neural networks model is retrieved as being used for Determine that the video cover of video cover determines model.

Another aspect provides a kind of video cover methods of exhibiting, in terminal, which comprises

At the first moment, the first video cover of the first video of server push is received, the first video cover is N Any cover in a candidate's cover, N are the integer more than or equal to 2；

The video playing entrance of first video is shown according to the first video cover；

At the second moment, the second video cover of first video of the server push is received；Second view Frequency cover is to determine that submodel is determined from N number of candidate cover by cover；The cover determines that submodel is basis N number of candidate cover and N number of candidate respective target user's operation data of cover carry out the volume of intensified learning acquisition Product neural network model；Target user's operation data is used to indicate target user's operation that first video reception arrives, And the target user operates corresponding candidate cover；Target user's operation is each user in targeted group To the user's operation that first video executes, designated user's group is that the terminal corresponds to user group where user；

The video playing entrance of first video is shown according to the second video cover.

On the one hand, a kind of video push device is provided, described device includes:

Candidate cover obtains module, and for obtaining N number of candidate cover in the first video, N is whole more than or equal to 2 Number；

Confidence level prediction module, for determining that model is respectively handled N number of candidate cover by video cover, The respective forecast confidence of N number of candidate cover is obtained, it is view that the forecast confidence, which is used to indicate corresponding candidate cover, The probability of frequency cover；The video cover determines that model is waited according to K in the second video candidate cover and described K The respective user's operation data of cover are selected to carry out the convolutional neural networks model of intensified learning acquisition；The user's operation data are used In the user's operation for indicating that second video reception arrives, and, the corresponding candidate cover of the user's operation, K be greater than or Person is equal to 2 integer；

Video cover obtains module, is used for according to the respective forecast confidence of N number of candidate cover, from N number of time Select the video cover that first video is obtained in cover；

Video push module pushes first video for sealing terminaloriented according to the video of first video.

Optionally, the video cover determines that model includes that at least two covers determine submodel；And described at least two Cover determines that submodel respectively corresponds respective user group；

The confidence level prediction module, is used for,

It inquires the terminal and corresponds to targeted group where user；

It obtains the corresponding cover of the targeted group and determines submodel, the corresponding cover of the targeted group determines son Model be according to K in the second video candidate cover and K candidate's respective target user's operation data of cover into The convolutional neural networks model that row intensified learning obtains；Target user's operation data is used to indicate target user's operation, with And the target user operates corresponding candidate cover；Target user's operation is each use in the targeted group The user's operation that family executes second video；

Determine that submodel is respectively handled N number of candidate cover by the corresponding cover of the targeted group, Obtain the respective forecast confidence of N number of candidate cover.

Optionally, the candidate cover obtains module, is used for,

Obtain each key images frame in first video；

Clustering processing is carried out to each key images frame, obtains at least two cluster centres, in each cluster At least one the key images frame of pericardium containing corresponding Same Scene type；

It extracts at least one key images frame respectively from least two cluster centre, obtains N number of candidate envelope Face.

Optionally, it is extracting at least one key images frame respectively from least two cluster centre, is obtaining the N When a candidate's cover, candidate's cover obtains module, is used for,

By at least two cluster centre, the quantity for the key images frame for including is less than the cluster centre of amount threshold It rejects, obtains N number of cluster centre；

It extracts a key images frame respectively from N number of cluster centre, obtains N number of candidate cover.

Optionally, the video cover determines that model includes feature extraction component and confidence level output precision；

The feature extraction component is used to extract the characteristics of image of the candidate cover of input；

The characteristics of image that the confidence level output precision is used to be extracted according to the feature extraction component exports the input Candidate cover forecast confidence.

Optionally, the feature extraction component is identical as the characteristic extraction part in image classification model；

Wherein, described image disaggregated model is obtained by the training of the tag along sort of sample image and the sample image Convolutional neural networks model.

Another aspect, provides a kind of for determining the training device of the model of video cover, and described device includes:

Candidate cover obtains module, and for obtaining K in the second video candidate cover, K is whole more than or equal to 2 Number；

Characteristic extracting module, for special by K described in the convolutional neural networks model extraction candidate respective image of cover Sign；Described image is characterized in the output of the feature extraction component in the convolutional neural networks；

Operation data obtain module, for respectively using it is described K candidate cover as second video video cover, Second video is pushed, the described K respective user's operation data of candidate cover are obtained；The user's operation data It is used to indicate the user's operation that second video reception arrives, and, the corresponding candidate cover of the user's operation；

Intensified learning module, for according to the described K candidate respective characteristics of image of cover and the K candidate envelope The respective user's operation data in face carry out the network parameter of the confidence level output precision in the convolutional neural networks model strong Chemistry is practised；The characteristics of image output prediction confidence that the confidence level output precision is used to be extracted according to the feature extraction component Degree, the forecast confidence are used to indicate the probability that corresponding candidate cover is video cover；

Model obtain module, for when the confidence level output precision output result convergence when, by the convolutional Neural Network model is retrieved as determining that the video cover of video cover determines model.

Optionally, described device further include:

Forecast confidence obtains module, for before model obtains module, obtaining the confidence level output precision output , the described K respective forecast confidence of candidate cover；

Determining module is restrained, described in determining when the convergence of the sum of the described K respective forecast confidence of candidate cover The output result of confidence level output precision restrains.

Optionally, the confidence level output precision includes vectorization function and activation primitive, and the forecast confidence obtains Module is used for,

Obtain the described K respective vectorization result of candidate cover；The K candidate respective vectorization of cover the result is that The vectorization function respectively corresponds the output result of described K candidate cover；

The described K candidate respective vectorization result of cover is handled by the activation primitive, obtains the K The candidate respective forecast confidence of cover.

Optionally, the intensified learning module, is used for,

According to the described K candidate respective user's operation data of cover, obtains the K candidate cover is respective and actually set Reliability；

According to the described K respective actual degree of belief acquisition strategy function of candidate cover, the strategic function is so that root The sum of the confidence level obtained according to the described K candidate respective characteristics of image of cover maximized function, the sum of described confidence level are The sum of described K respective forecast confidence of candidate cover；The matrix format of variable element in the strategic function with it is described The matrix format of the network parameter of confidence level output precision is identical；

Variable element in the strategic function is retrieved as to the network parameter of the vectorization component.

Optionally, the operation data obtains module, is used for,

Respectively using described K candidate cover as the video cover of second video, second video is pushed away It send；

At least one user in designated user's group is obtained to the user operation records of second video, the user behaviour It notes down and corresponds to respective candidate cover；

According at least one described user to the user operation records of second video, designated user's group pair is obtained It answers, the described K respective user's operation data of candidate cover；

When the output result convergence when the confidence level output precision, the convolutional neural networks model is retrieved as For determining that the video cover of video cover determines model, comprising:

When the convergence of the output result of the confidence level output precision, the convolutional neural networks model is retrieved as and institute It states the corresponding cover of designated user's group and determines submodel.

Optionally, described device further include: grouping module obtains designated user for obtaining module in the operation data Before at least one user in group is to the user operation records of second video, according to each user to the use of each video Family operation note is grouped each user, obtains at least one user group, includes at least one described user group Designated user's group.

Optionally, described device further include:

Probability obtains module, for when the output result of the confidence level output precision is not converged, according to the reliability The output result of output precision obtains the displaying probability in described K candidate cover each leisure next designated length period；

Pushing module, for according to the exhibition in described K candidate cover each leisure next designated length period Show probability, respectively using described K candidate cover as the video cover of second video, to described in each terminal push Second video；

The operation data obtains module, is also used to obtain in next designated length period, the K time Select the respective new user's operation data of cover；

The intensified learning module is also used to according to the described K candidate respective characteristics of image of cover and the K The candidate respective new user's operation data of cover carry out intensified learning to the network parameter of the confidence level output precision.

Another aspect provides a kind of video cover displaying device, in terminal, described device to include:

First receiving module, for receiving the first video cover of the first video of server push, institute at the first moment Stating the first video cover is any cover in N number of candidate cover, and N is the integer more than or equal to 2；

First display module, for showing the video playing entrance of first video according to the first video cover；

Second receiving module, for receiving the second view of first video of the server push at the second moment Frequency cover；The second video cover is to determine that submodel is determined from N number of candidate cover by cover；The cover Determine that submodel is carried out according to N number of candidate cover and N number of candidate respective target user's operation data of cover The convolutional neural networks model that intensified learning obtains；Target user's operation data is used to indicate first video reception and arrives Target user's operation, and, the corresponding candidate cover of target user operation；Target user's operation is target user The user's operation that each user in group executes first video, designated user's group are that the terminal corresponds to user institute User group；

Second display module, for showing the video playing entrance of first video according to the second video cover.

Another aspect provides a kind of computer equipment, and the computer equipment includes processor and memory, described to deposit Be stored at least one instruction, at least a Duan Chengxu, code set or instruction set in reservoir, at least one instruction, it is described extremely A few Duan Chengxu, the code set or instruction set are loaded by the processor and are executed to realize video push side as described above Method, the training method of model for determining video cover, alternatively, video cover methods of exhibiting.

Another aspect provides a kind of computer readable storage medium, at least one finger is stored in the storage medium Enable, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, the code set or Instruction set is loaded by processor and is executed the model to realize video pushing method as described above, for determining video cover Training method, alternatively, video cover methods of exhibiting.

Technical solution provided by the present application can include the following benefits:

It determines that model handles each candidate cover in the first video by video cover, obtains each candidate envelope The corresponding forecast confidence in face, and the video cover of the first video is selected according to forecast confidence from candidate cover, by Determine that model is the convolution that the operation executed according to same video of the user to different covers gives intensive training in video cover Neural network model has comprehensively considered user to the selection operation of video cover, improves true subsequently through the model trained Determine the accuracy of video cover.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The application can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the application Example, and together with specification it is used to explain the principle of the application.

Fig. 1 is the frame diagram that a kind of model training and picture frame shown according to an exemplary embodiment determine；

Fig. 2 is a kind of flow diagram of video push shown according to an exemplary embodiment；

Fig. 3 is that a kind of model training process for determining video cover shown according to an exemplary embodiment is illustrated Figure；

Fig. 4 is the flow diagram of terminal record and upload user operation note that embodiment illustrated in fig. 3 is related to；

Fig. 5 is that a kind of training and video for determining the model of video cover shown according to an exemplary embodiment pushes away The flow chart of delivery method；

Fig. 6 is that a kind of video cover that embodiment illustrated in fig. 5 is related to shows flow diagram；

Fig. 7 is the video cover variation schematic diagram of same video before and after the model training that embodiment illustrated in fig. 5 is related to；

Fig. 8 is that the video cover based on intensified learning that embodiment illustrated in fig. 5 is related to automatically generates and on-line selection method General frame figure；

Fig. 9 is a kind of model training flow diagram that embodiment illustrated in fig. 5 is related to；

Figure 10 is a kind of structural block diagram of video push device shown according to an exemplary embodiment；

Figure 11 is shown according to an exemplary embodiment a kind of for determining the training device of the model of video cover Structural block diagram；

Figure 12 is the structural block diagram that a kind of video cover shown according to an exemplary embodiment shows device；

Figure 13 is a kind of structural schematic diagram of computer equipment shown according to an exemplary embodiment；

Figure 14 is a kind of structural schematic diagram of computer equipment shown according to an exemplary embodiment.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the application.

Present applicant proposes a kind of efficient and high-accuracy model training and model application scheme, the program can be instructed Practice the machine learning model obtained for determining the video cover of the video from video, and is determined based on the machine learning model Video cover carry out video push.In order to make it easy to understand, below to the invention relates to several nouns explain.

(1) video cover: in the application interface of video playback class application program, alternatively, in Webpage, corresponding video The picture for playing entrance and showing, the as video cover of the video.Under normal conditions, the video cover of a video usually with The content of the video is related, for example, the video cover of the video can be some picture frame in video.

(2) confidence level of picture frame: in this application, the confidence level of picture frame and the picture frame are a designateds The probability correlation of video cover, i.e. picture frame are that the probability of the video cover of designated is bigger, and the confidence level of the picture frame is got over It is high.

With the continuous development that network video is applied, user or Video service quotient are uploaded to the video of network also increasingly It is more, correspondingly, the video resource that user can choose viewing is also more and more abundant.Whether the cover of one video is reasonable, is to inhale An important factor for quoting the family click play video, and the uploader of many videos may be without setting in network for the video of upload Cover is set, or manually setting cover is unreasonable, this just needs the server of video provider that can close automatically for video setting The cover of reason.

Scheme shown in the subsequent each embodiment of the application provides a kind of new model training for being used to determine video cover And application scheme, the model trained with this solution is accurately determined from video and is suitble to generate video cover Picture frame, while the efficiency of model training and update can also be improved.

The scheme of the subsequent each embodiment of the application is a kind of scheme of training machine learning model.Fig. 1 is shown according to one Example property implements the frame diagram that a kind of model training exemplified and video cover determine.As shown in Figure 1, in model training stage, Model training equipment 110, to the user's operation of covers different in same video, trains machine learning model by each user, The stage is determined in video cover, and cover determines equipment 120 according to each of trained machine learning model and the video of input A candidate's cover determines video cover from each candidate cover.

Wherein, above-mentioned model training equipment 110 and cover determine that equipment 120 can be the calculating with machine learning ability Machine equipment, for example, the computer equipment can be the stationary computers equipment such as PC, server, alternatively, the computer Equipment is also possible to the Mobile Computings machine equipment such as tablet computer, E-book reader or portable medical device.

Optionally, above-mentioned model training equipment 110 and cover determine that equipment 120 can be the same equipment, alternatively, model Training equipment 110 and cover determine that equipment 120 is also possible to different equipment.Also, when model training equipment 110 and cover are true When locking equipment 120 is different equipment, model training equipment 110 and cover determine that equipment 120 can be same type of equipment, For example model training equipment 110 and cover determine that equipment 120 can all be server；Alternatively, model training equipment 110 and cover Determine that equipment 120 is also possible to different types of equipment, for example model training equipment 110 can be PC, and cover is true Locking equipment 120 can be server etc..The embodiment of the present application determines model training equipment 110 and cover the tool of equipment 120 Body type is without limitation.

Fig. 2 is a kind of flow diagram of video push shown according to an exemplary embodiment.Wherein, model training is set Standby (such as server) can be according to the K candidate cover and this K respective user's operation of candidate cover in the second video Data carry out intensified learning to convolutional neural networks model, obtain video cover and determine model.Above-mentioned user's operation data are used for Indicate the user's operation that the second video reception arrives, and, the corresponding candidate cover of the user's operation, K is more than or equal to 2 Integer.

As shown in Fig. 2, N number of time when pushing the first video to each terminal, in available first video of server Cover is selected, N is the integer (S21) more than or equal to 2；Then determine that model is right respectively by above-mentioned trained video cover N number of candidate's cover is handled, and obtains the respective forecast confidence of N number of candidate cover, wherein the forecast confidence is used to indicate Corresponding candidate's cover is the probability (S22) of video cover；Later, according to N number of candidate respective forecast confidence of cover, The video cover (S23) of the first video is obtained from N number of candidate cover；Further according to the video envelope terminaloriented push of the first video First video (S24).

In conclusion in the embodiment of the present application, first passing through the corresponding K picture frame of same video in advance respectively as video User's operation data when cover, the training convolutional neural networks in the way of intensified learning obtain video cover and determine model, After the completion of model training, when pushing to the first video, determine model to the first video by trained video cover N number of candidate cover handled, obtain the probability that N number of candidate cover is individually video cover, and from N number of candidate cover really The video cover for making the first video grasps the selection of video cover due to during model training, having comprehensively considered user Make, improves the accuracy of the model trained, determine video cover subsequently through the model trained correspondingly, also improving Accuracy.

Fig. 3 is that a kind of training for determining the model of video cover for being implemented to exemplify according to an exemplary embodiment is flowed Journey schematic diagram.As shown in figure 3, an initial convolutional neural networks model is arranged in developer first, which includes Feature extraction component and confidence level output precision.The purpose of model flow shown in Fig. 3, including train in the confidence level output precision Network parameter.By taking model training equipment is the server of video provider as an example, as shown in figure 3, for the second video, the clothes The candidate cover (S31) of K to be engaged in available second video of device, the K are the integer more than or equal to 2；Server passes through K respective characteristics of image of candidate cover of convolutional neural networks model extraction；Wherein, above-mentioned characteristics of image is the convolutional Neural net The output (S32) of feature extraction component in network；Then, server is respectively using this K candidate cover as second video Video cover pushes second video, obtains this K respective user's operation data of candidate cover；The user's operation Data are used to indicate the user's operation that second video reception arrives, and, the corresponding candidate cover (S33) of the user's operation；And And server is according to this K candidate respective characteristics of image of cover and this K respective user's operation data of candidate's cover, Intensified learning (S34) is carried out to the network parameter of the confidence level output precision in the convolutional neural networks model；Wherein, the confidence The characteristics of image that degree output precision is used to be extracted according to this feature extraction assembly exports forecast confidence, which is used for Indicate that corresponding candidate cover is the probability of video cover.When the convergence of the output result of the confidence level output precision, by the volume Product neural network model is retrieved as determining that the video cover of video cover determines model (S35)；If confidence level output precision Output result it is not converged, then server can continue video push and intensive training with return step S33, until confidence Spend the output result convergence of output precision.

Wherein, the second video in the first video and embodiment illustrated in fig. 3 in above-mentioned embodiment illustrated in fig. 2 can be not Same video, i.e., after server is giveed intensive training by the push of the second video, trained model can be used to determine The cover of other videos except second video determines.

Alternatively, above-mentioned first video and the second video are also possible to the same video, i.e. server passes through the first video After push gives intensive training, the video cover of the first video can be determined by trained model, it is subsequent to the first video Push during, with model determine video cover pushed.

In above-mentioned training process, need terminal side that there is the ability of user's operation information of feedback.Wherein, figure is please referred to 4, it illustrates the invention relates to terminal record and upload user operation note flow diagram.As shown in figure 4, Terminal can receive the PUSH message (S41) of the second video of server transmission, K, the video cover for including in the PUSH message Any cover in candidate cover；Terminal shows the video playing entrance (S42) of second video according to the video cover；It is connecing After receiving the trigger action to the video playing entrance, terminal obtains user operation records (S43), which is used for Indicate the user's operation executed in present terminal for second video；Terminal sends the user operation records to the server (S44), so that the server obtains the K respective user's operation of candidate cover according to the user operation records that each terminal is sent Data.

In conclusion in the embodiment of the present application, to include the convolution of feature extraction component and confidence level output precision mind Through network model as initial model, and using the network parameter of confidence level output precision as training objective, pass through same video pair User's operation data and feature extraction component when answer K candidate cover is respectively as cover to K candidate's cover respectively Output that treated is as a result, training obtains the network parameter of confidence level output precision, on the one hand, has due to passing through in the application Machine learning model extracts the feature of candidate cover, does not need the evaluation index of the special designed image of developer, another party Face, user's operation data when in conjunction with K candidate cover respectively as video cover carry out training pattern, have comprehensively considered user couple The selection operation of video cover improves the accuracy of the model trained, thus guaranteeing that the model trained can be accurate Slave video in determine be suitble to generate video cover picture frame while, additionally it is possible to improve the efficiency of model training and update.

Wherein, in above-mentioned scheme shown in Fig. 3, the determination process of the network parameter of above-mentioned confidence level output precision can be by It is executed according to different period iteration, also, pushes each candidate cover in a rear network parameter determination process as video The probability of cover can be optimized according to the output of the preceding confidence level output precision once trained.

Fig. 5 is that a kind of training and video for determining the model of video cover shown according to an exemplary embodiment pushes away The flow chart of delivery method, this is used to determine that the training of the model of video cover and video pushing method to can be used for computer and set It is standby, for example above-mentioned model training equipment 110 shown in FIG. 1 and cover determine in equipment 120, obtains above-mentioned Fig. 2 or figure with training The video cover that embodiment shown in 3 is related to determines model, and carries out video push according to determining model.With above-mentioned model instruction Practice for equipment 110 and cover determine the server that equipment 120 is video provider, as shown in figure 5, this method may include as Lower step:

Step 501, server obtains K candidate cover in the second video, and K is the integer more than or equal to 2.

Wherein, above-mentioned K candidate cover can be at least two picture frames representative in the second video, for example, This K candidate cover can be the picture frame for respectively representing different scenes in the second video, alternatively, above-mentioned K candidate cover It can be and respectively represent different people/object picture frame in the second video.

It is to respectively represent in the second video for the picture frame of different scenes by K candidate cover, server obtains second The scheme of K candidate cover in video can be such that

S501a obtains each key images frame in the second video.

Wherein, each key images frame in the embodiment of the present application is the figure for respectively corresponding each scene in the second video As frame.

In a kind of possible example, server, can be first to the when obtaining each key frame in the second video Two videos carry out scene cut, several scene segments are obtained, then, from each scene segment in several scene segments At least one picture frame is extracted, as each key images frame in above-mentioned second video.

Wherein, when extracting at least one picture frame from each scene segment in several scene segments, server can Firstly for each scene segment, to filter the solid-color image frame in the scene segment, blurred picture frame and multiimage frame, Then by picture frame remaining in the scene segment, according to picture quality, (for example color saturation, acutance and picture material are multiple At least one of miscellaneous degree) it is ranked up, and at least one picture frame stood out is retrieved as the corresponding pass of the scene segment Key picture frame.

S501b carries out clustering processing to each key images frame, obtains at least two cluster centres, each in the cluster At least one the key images frame of pericardium containing corresponding Same Scene type.

In the embodiment of the present application, server can carry out k-means cluster to each key images frame, wherein k- is flat Equal algorithm (English: k-means clustering) is more made at present derived from one of signal processing vector quantization method The field of data mining is popular in for a kind of clustering method.The purpose of k- average cluster is: n point (can be sample Primary observation or an example) it is divided into k and clusters, so that each point belongs to closest to him mean value (during this clusters The heart) corresponding cluster, using the standard as cluster.It in the embodiment of the present application, can be by scene phase using k-means cluster As key images frame cluster.

S501c extracts at least one key images frame respectively from least two cluster centre, obtains this K candidate Cover.

In a kind of possible example, server can extract one from each cluster centre that above-mentioned cluster obtains respectively A picture frame, that is to say, that the number K of above-mentioned cluster centre can be set to the number of image frames in the first video as cover Amount, can select the maximum cover of difference by the key images frame aggregation of similar scene at one kind in this way convenient for subsequent step Figure, and improve computational efficiency.For example, server is full according to color for each key frame picture in each cluster centre It sorts with attributes such as degree, acutance, content complexities, optimal picture is selected in each cluster centre, obtains K picture frame, group At candidate cover atlas.

Alternatively, server can also extract multiple images frame respectively from each cluster centre, for example, when above-mentioned each poly- When negligible amounts (for example being less than 3) at class center, server can extract two or three respectively from each cluster centre A picture frame, as above-mentioned K picture frame.

In the embodiment of the present application, in order to further increase subsequent computational efficiency, after the completion of cluster, server can be with Cluster centre is screened, to be further reduced the numerical value of K, for example, in the embodiment of the present application, server can be by this extremely In few two cluster centres, the cluster centre that the quantity for the key images frame for including is less than amount threshold is rejected, and obtains K cluster Center；And extract a key images frame respectively from the K cluster centre, obtain the K picture frame.

Above-mentioned amount threshold can be the pre-set numerical value of developer, alternatively, above-mentioned amount threshold is also possible to take The numerical value that the quantity for the picture frame that business device includes according to the first video determines.Wherein, when above-mentioned amount threshold be server according to When the numerical value that the quantity of the picture frame that the first video includes determines, picture frame which can include with the first video Quantity is positively correlated, that is to say, that the quantity for the picture frame that the first video includes is bigger, and the numerical value of amount threshold is also bigger, instead It, the quantity for the picture frame that the first video includes is smaller, and the numerical value of amount threshold is also smaller.

Wherein, when the key images number of frames that some cluster centre includes in the first video is less, it is believed that this The scene segment of the cluster centre involved in one video is shorter, and the picture frame in the scene segment is not suitable for representing first view Frequently, therefore, when obtaining K picture frame in the first video, server is after being clustered, first by each cluster centre In, the cluster centre of the negligible amounts (for example being less than 5) comprising picture frame excludes, and extracts respectively from remaining cluster centre A picture frame out, as above-mentioned K picture frame.

It is obtained except K candidate cover except through the mode of above-mentioned key-frame extraction and cluster, server can also lead to It crosses other way and obtains K candidate cover, for example, server (for example can be schemed by preparatory trained machine learning model As disaggregated model) some or all of picture frame in the first video is handled, and K is therefrom selected according to processing result A candidate's cover.

Step 502, server carries out the second video respectively using K candidate cover as the video cover of the second video Push.Correspondingly, terminal receives server to the PUSH message of the second video, the video cover for including in the PUSH message is the One in K candidate cover in two videos.

In one possible implementation, server is when sometime pushing the second video to terminal, can be with Video cover of the random selection one candidate cover as the first video from K candidate cover, and according to video cover to this Terminal sends the PUSH message of the second video.

In a possible example, server can be directly using the video cover selected as the surface plot of the second video Picture.

Alternatively, server can also carry out predetermined process to video cover in another possible example, second is obtained The cover image of video, for example, server such as can be cut to video cover, be sharpened at the processing, to obtain the second video Cover image.

That is, for the same terminal, before model training completion, server twice to the terminal push this When two videos, the video cover carried in PUSH message can be candidate cover different in the second video；Correspondingly, for not With two terminals, when server pushes second video towards two terminals respectively, the video cover that is carried in PUSH message It is also possible to candidate cover different in the second video.

Wherein, terminal can receive service in interface or the Webpage for showing the corresponding application program of server The PUSH message for the second video that device is sent.Wherein, which can be video playback class application program (including short-sighted frequency Application program etc.) either other application programs with video playback capability or web page display function.

Step 503, terminal shows the video playing entrance of second video according to video cover.

After terminal receives the PUSH message of the second video, can according to the PUSH message, application program interface or The video playing entrance of the second video is shown in person's Webpage, for example, the video playing entrance can be an image link, Alternatively, the video playing entrance can be a picture control.Picture in above-mentioned image link or picture control is above-mentioned The video cover of the second video carried in PUSH message.

Step 504, after terminal receives the trigger action to the video playing entrance, user operation records, the use are obtained Family operation note is used to indicate the user's operation executed in present terminal for second video.

In the embodiment of the present application, terminal is after the video playing entrance for showing the second video, also record user to this The operation note of two videos, such as, if the video playing entrance for clicking second video plays the duration of second video, Whether second video is thumbed up, and, if forward second video etc..

Step 505, terminal sends the user operation records to the server, and server receives the user operation records.

After terminal records the user operation records, can by the user operation records of the second video periodically or immediately on It is transmitted to server.Correspondingly, server receives the user operation records.

Wherein, above-mentioned every user operation records correspond to respective candidate cover.

It, can mark directly comprising corresponding candidate cover in above-mentioned user operation records in a kind of illustrative scheme Know.For example, terminal is when generating a user operation records, the mark of the corresponding candidate cover of available user's operation, and The mark for the candidate cover that will acquire is added in the user operation records.

It, can not also be directly comprising corresponding candidate envelope in above-mentioned user operation records in another illustrative scheme The mark in face, after server receives a user operation records of terminal transmission, the available user operation records are corresponding Candidate cover mark, and the candidate cover that will acquire mark it is corresponding with the user operation records storage.For example, terminal When generating a user operation records, the mark of the PUSH message of corresponding video can be added in the user operation records Know, after server receives the user operation records, inquires the corresponding candidate cover of mark of the PUSH message in the server Mark.

Step 506, server obtains this K respective user's operation data of candidate cover.

Wherein, which is used to indicate the user's operation that the second video reception arrives, and, the user's operation pair The candidate cover answered.

In the embodiment of the present application, server can be the period according to the designated length period, count each user terminal It uploads, the user operation records of corresponding second video, to obtain within each designated length period, K candidate cover Respectively as the second video cover when, each user be directed to second video user's operation data.

Wherein, in a kind of possible example, which includes at least one in following data:

When corresponding candidate's video cover of the cover as second video, the clicking rate of second video；

When corresponding candidate's video cover of the cover as second video, second video be clicked every time after broadcasting Duration；

When corresponding candidate's video cover of the cover as second video, which is thumbed up rate；

When corresponding candidate's video cover of the cover as second video, which is forwarded rate.

It, can be with when server obtains this K candidate cover respective user's operation data in a kind of illustrative scheme Obtain in each designated length period, K candidate cover respectively as the second video video cover when, whole users are directed to The user's operation data of second video.

For example, server receives 1000 users respectively to the use of the second video within some designated length period Family operation note, server can generate K candidate according to 1000 users respectively to the user operation records of the second video Cover respectively as the second video cover when, each user be directed to second video user's operation data.

In another illustrative scheme, server can also according to each user to the user's operation of each video, Each user is grouped, at least one user group is obtained, and be directed to each user group, it is corresponding to obtain the user group respectively , the K respective user's operation data of candidate cover.For example, at least one of available designated user's group of server is used (at least one user corresponds to respective time to the user operation records of the second video to user operation records of the family to the second video Select cover), and according at least one user to the user operation records of the second video, acquisition designated user's group is corresponding, and K The candidate respective user's operation data of cover.

For example, server previously according to user each in system respectively to the user operation records of each video, to each User is grouped, and obtains at least one user group, and within some designated length period, server receives 1000 users Respectively to the user operation records of the second video, for designated user's group at least one above-mentioned user group, server can be with At least one (such as the 100) user for belonging to designated user group in 1000 users is obtained respectively to the use of the second video Family operation note obtains the designated user and organizes corresponding, this K respective user's operation data of candidate cover.

Step 507, server passes through this K respective characteristics of image of candidate cover of convolutional neural networks model extraction；It should Characteristics of image is the output of the feature extraction component in the convolutional neural networks.

In the embodiment of the present application, in addition to the network parameter of the full articulamentum of the last layer in above-mentioned convolutional neural networks model Except, other layers of network parameter can be the pre-set parameter of developer.

Other than CNN model, other nerves for containing at least two full articulamentum are also can be used in the embodiment of the present application Network model is trained, such as Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN) or depth nerve Network (Deep Neural Networks, DNN) etc..In addition, the full articulamentum in above-mentioned model also could alternatively be other use The function of vectorization is carried out to characteristics of image in realization.

In one possible implementation, the feature extraction component in the convolutional neural networks model, with image classification Characteristic extraction part in model is identical；Wherein, which is point by sample image and the sample image The model that class label training obtains.

In the embodiment of the present application, the characteristic extraction part that can be multiplexed in existing image classification model, as convolution Feature extraction component in neural network model, and K is extracted by the feature extraction component in convolutional neural networks model respectively The respective characteristics of image of a candidate's cover, and using the output result of feature extraction component as the K respective feature of candidate cover Data.

For example, each candidate cover, is carried out candidate using CNN model in cover candidate for K by taking CNN model as an example The characteristic vector pickup of cover.Wherein, CNN model is a kind of BP network model, its artificial neuron can respond Surrounding cells in a part of coverage area have outstanding performance for large-scale image procossing.CNN network is by one or more convolution Full articulamentum (corresponding classical neural network) composition of layer and top, while also including associated weights and pond layer (pooling layer).This structure enables CNN model preferably to utilize the two-dimensional structure of input data.With other deep learning knots Structure is compared, CNN model image and in terms of can provide better result.Currently, mainstream is divided for picture The network structure model of class has VggNet model, ResNet model etc..In the embodiment of the present application, it can be used and pass through common data The image classification CNN model of library pre-training is as above-mentioned convolutional neural networks model, and according to subsequent intensified learning to volume The network parameter of product neural network model carries out tuning, has not only retained the high-level semantics feature of picture frame, but also make network characterization table Show the task scene for being more suitable for cover selection.Wherein, the above-mentioned K respective characteristic of candidate cover, can be CNN model When being respectively processed to K candidate cover, the output result of the full articulamentum of layer second from the bottom in CNN model.

Wherein, above-mentioned steps 507 can execute between step 501 to step 508, and with step 502 to step 507 Between execution sequence it is unlimited.

Step 508, server is according to the K candidate respective characteristics of image of cover and the K respective user of candidate cover Operation data carries out intensified learning to the network parameter of the confidence level output precision in convolutional neural networks model.

Wherein, the characteristics of image output prediction confidence which is used to be extracted according to feature extraction component Degree, the forecast confidence are used to indicate the probability that corresponding candidate cover is video cover.

Optionally, server can obtain this K candidate according to this K respective user's operation data of candidate cover The respective actual degree of belief of cover (also referred to as reward numerical value)；And plan is obtained according to the K candidate respective actual degree of belief of cover Slightly function, the strategic function are so that the sum of confidence level obtained according to the K candidate respective characteristics of image of cover is maximized Function, the sum of above-mentioned confidence level are the sum of K respective forecast confidences of candidate cover.Wherein, variable in the strategic function The matrix format of parameter is identical as the matrix format of the network parameter of the confidence level output precision.

For example, server obtained in the designated length period after a certain designated length period, this K time The respective user's operation data of cover are selected, and according to the respective user's operation data of this K candidate's cover and characteristic, are obtained The designated length period is taken to correspond to the network parameter of the confidence level output precision.

In the embodiment of the present application, by taking CNN model as an example, for i-th of candidate cover in K candidate's cover, if should The displaying probability of candidate cover is P_i, wherein i is less than or equal to the positive integer of K, then has:

P_i=σ (Wh_i)；

Wherein, h_iFor the full articulamentum of CNN network layer second from the bottom output as a result, W is that full articulamentum is (i.e. last after hidden layer One layer of full articulamentum) network parameter, σ be sigmoid function (i.e. activation primitive).

In addition, using user's operation data include video cover of the corresponding candidate cover as second video when, this When cover as second video of the clicking rate of two videos and corresponding picture frame, after which is clicked every time Playing duration for, server statistics are within the above-mentioned designated length period, by candidate cover each in K candidate cover When as video cover, the clicking rate and playing duration of the second video.

Since the clicking rate of video can reflect cover to the attraction degree of user, and playing duration can reflect cover pair The semantic matches degree of video, sets up the prize and encourages function representation are as follows:

R=R_click+R_duration；

Wherein, R_clickThe function for input that be with the clicking rate of video be, R_durationIt is broadcasting after being clicked every time with video The function of a length of input when putting, the task object of the application are to find a strategic function P (θ) by way of intensified learning, So that the sum of respective confidence level of K picture frame that reward function is calculated maximizes, wherein P (θ) is by CNN net definitions. Wherein, objective function can indicate are as follows:

J (θ)=E_P(θ)[R]；

By above-mentioned intensified learning training process, the character representation for the cover that strategic function can be allowed to obtain is maximum Change the user behavior on fit line.

Step 509, server obtains the output of confidence level output precision, this K respective forecast confidence of candidate cover.

In the embodiment of the present application, the output of the function trained is as a result, i.e. as the prediction confidence for corresponding to candidate cover Degree, is also really the probability of the video cover of the second video as correspondence image.

In a kind of possible example, which includes vectorization function and activation primitive, and server obtains Confidence level output precision is taken to export, the process of this K respective forecast confidence of candidate cover can be such that

Server obtains this K respective vectorization result of candidate cover；This K respective vectorization result of candidate cover It is the output result that the vectorization function respectively corresponds K candidate cover；Server is by the activation primitive to this K candidate envelope The respective vectorization result in face is handled, this K respective forecast confidence of candidate cover is obtained.

For example, in the embodiment of the present application, by taking CNN model as an example, server can join the network that above-mentioned training obtains Network parameter w of the number as the full articulamentum of the last layer in CNN model, brings above-mentioned formula P into_i=σ (Wh_i), obtain K candidate The respective displaying probability of cover, that is, the K respective forecast confidence of candidate cover, and K candidate cover is respective pre- It is cumulative to survey confidence level, obtains in the above-mentioned designated length period, the sum of K respective forecast confidence of candidate cover.

Step 510, server judges whether the sum of this K respective forecast confidence of candidate cover restrains, if so, into Step 511, otherwise, return step 502.

Wherein, at least one of server available K candidate cover before the above-mentioned designated length period is specified The sum of corresponding forecast confidence of length of time section, and according to K candidate cover within the above-mentioned designated length period At least one the designated length time of the sum of forecast confidence and K candidate cover before the above-mentioned designated length period The sum of forecast confidence in section, judges whether the sum of this K respective forecast confidence of candidate cover restrains.

For example, when the sum of the forecast confidence in the above-mentioned designated length period, before the above-mentioned designated length period A designated length period in the sum of forecast confidence between difference, be less than difference threshold, it may be considered that the K The convergence of the sum of candidate respective forecast confidence of cover.

Step 511, convolutional neural networks model is retrieved as being used to determine that the video cover of video cover determines by server Model.

In a kind of illustrative scheme, when the respective user's operation data of above-mentioned K candidate cover are according to all When the user's operation data that the user operation records of user obtain, above-mentioned video cover determines that model can be for all users Model.

In another illustrative scheme, when the respective user's operation data of above-mentioned K candidate cover are according to finger When determining designated user's operation data of user operation records acquisition of at least one user in user group, when above-mentioned confidence level is defeated Out when the output result convergence of component, which can be retrieved as corresponding with designated user's group by server Cover determine submodel.

In the embodiment of the present application, when the output result of confidence level output precision is not converged, server can also basis The output result of reliability output precision obtains the displaying probability in K candidate cover each leisure next designated length period； According to the displaying probability in this K candidate cover each comfortable next designated length period, respectively with this K candidate cover As the cover image frame of second video, second video is pushed to each terminal；When obtaining next designated length Between in section, this K respective new user's operation data of candidate cover；According to this K candidate respective new user behaviour of cover Make data and this K candidate respective characteristics of image of cover carries out intensified learning to the network parameter of confidence level output precision.

In the embodiment of the present application, server can be using K candidate cover as the video cover of the first video When being pushed, behavior adjustment push is tactful depending on the user's operation, to reduce the accumulation demand to user's operation data, from Achieve the effect that acceleration model convergence rate.For example, after a designated length period, server is sentenced by taking CNN model as an example The sum of respective forecast confidence of disconnected K candidate cover not up to convergence state, at this point, server is according to the designated length time The corresponding training result of section, according to above-mentioned formula P_i=σ (Wh_i), the K respective displaying probability of candidate cover is obtained, and next In a designated length period, cover push is carried out according to the calculated K candidate respective displaying probability of cover, that is, Institute, for i-th of candidate cover, the displaying probability of this i-th candidate cover is higher, and server is in next designated length time When the interior push for carrying out the second video, this i-th candidate cover is more possible to be arranged to the video cover of the second video.

Step 512, server obtains N number of candidate cover in the first video, and N is the integer more than or equal to 2.

Optionally, each key images frame in available first video of server；To each key images frame Clustering processing is carried out, at least two cluster centres are obtained, each cluster centre includes at least the one of corresponding Same Scene type A key images frame；It extracts at least one key images frame respectively from least two cluster centre, obtains N number of candidate envelope Face.

Optionally, it is extracting at least one key images frame respectively from least two cluster centres, is obtaining N candidate When cover, server can be by least two cluster centres, and the quantity for the key images frame for including is less than the poly- of amount threshold Class center is rejected, and is obtained N number of cluster centre, and extract a key images frame respectively from N number of cluster centre, is obtained above-mentioned N A candidate's cover.

Wherein, server obtains N number of candidate cover in the first video, with the K candidate cover obtained in the second video The step of it is similar, details are not described herein again.

Step 513, server determines that model is respectively handled N number of candidate cover by video cover, obtains the N The respective forecast confidence of a candidate's cover.

In one possible implementation, above-mentioned video cover determines that model can be and carries out for all user terminals The model that video cover determines.

In alternatively possible implementation, which determines that model includes that at least two covers determine submodule Type；And at least two cover determines that submodel respectively corresponds respective user group；Server can inquire the terminal to application Targeted group where family；It obtains the corresponding cover of the targeted group and determines submodel, the corresponding envelope of the targeted group Face determines that submodel is operated according to K in the second video candidate cover and this K candidate respective target user of cover The convolutional neural networks model of data progress intensified learning acquisition；Target user's operation data is used to indicate target user behaviour Make, and, the corresponding candidate cover of target user operation；Target user operation is each user in the targeted group The user's operation that second video is executed；Determine submodel respectively to N number of time by the corresponding cover of the targeted group It selects cover to be handled, obtains the respective forecast confidence of N number of candidate cover.

Step 514, according to N number of candidate respective forecast confidence of cover, obtained from N number of candidate cover this first The video cover of video.

In the embodiment of the present application, server can be by N number of candidate cover, the corresponding highest cover of forecast confidence Video cover as the first video.

Step 515, terminaloriented is sealed according to the video of first video and pushes first video.

After the above-mentioned video cover of training acquisition determines model, server can determine mould according to the video cover Type determines a video cover to each video respectively, and carries out video push according to determining video cover.

Optionally, according to N number of candidate respective forecast confidence of cover, determined from N number of candidate cover this Before the video cover of one video, server can also obtain the respective image classification of N number of candidate cover；Determine N number of time The matching cover in cover is selected, which is the candidate that the video presentation information of image classification and first video matches Cover；According to the respective forecast confidence of N number of candidate cover, the envelope of first video is determined from N number of candidate cover When the picture frame of face, server can will correspond to the highest candidate cover of forecast confidence and be retrieved as first view in the matching cover The video cover of frequency.

In one possible implementation, when determining the cover of first video, server can be combined with candidate The matching degree of cover and video selects the cover of video.For example, the video profile information of available first video of server, And according to the respective image classification of N number of candidate cover, the matching degree between image classification and video profile is then calculated, it will be N number of In candidate cover, Corresponding matching degree is higher than in the candidate cover of matching degree threshold value, and confidence level highest one candidate cover obtains For the video cover of the first video.

For example, above-mentioned first video is that some vehicle comments program video, it is that " XX vehicle is commented which, which comments the profile information of program video, People test ride Y vehicle ", server calculate 5 of the first video candidate covers respectively with the matching degree of the profile information, wherein 2 are not Candidate cover comprising automobile is lower than matching degree threshold value, and the other three candidate covers comprising automobile are higher than matching degree threshold value, Then for server by three candidate covers comprising automobile, the corresponding highest candidate cover of forecast confidence is retrieved as first view The video cover of frequency.

In a kind of possible example, if the video cover that above-mentioned training obtains determines that model includes that designated user organizes correspondence Cover determine submodel (correspondingly, also there is other user groups respective cover to determine submodel), then server to this When the terminal of user in designated user's group pushes the first video, submodel can be determined from the N number of of the first video according to cover Video cover, and the terminal push the of the user according to from determining video cover to designated user's group are determined in candidate cover One video.

Through the above scheme, for same first video, belong to the terminal display of each user of same user group The cover of first video is the same video cover, and belong to each user of different user groups terminal display this The cover of one video can be different video cover.And since the grouping of each user by user grasps the user of each video It makes decision, therefore, through the above scheme, the user of different hobbies can be directed to, from multiple candidate covers in the first video, Select the candidate cover of the possible preference of user as video cover.For example, some user preference is clicked and viewing has matinée idol Video cover video, which can be subdivided into specific user group according to user preference by server, and subsequent basis should When the corresponding object module of user group is that the user pushes the video cover of other videos, it can also be more likely to push to the user Video cover with matinée idol.

Wherein, above-mentioned first video and the second video can be the same video, be also possible to different videos.When above-mentioned When first video and the second video are the same videos, some terminal is before and after object module training is completed, to the first video exhibition The video cover of the video playing entrance shown may change.For example, referring to FIG. 6, it illustrates the embodiment of the present application A kind of video cover being related to shows flow diagram.As shown in fig. 6, the video playing entrance of above-mentioned first video of terminal display The step of can be such that

S61, at the first moment before object module training is completed, terminal receives the of the first video of server push One video cover, the first video cover are any covers (such as randomly selected cover) in N number of candidate cover, and N is big In or equal to 2 integer.

S62, terminal show the video playing entrance of first video according to the first video cover；

S63, at the second moment after the completion of subsequent video cover determines model training, terminal can receive the server push Second video cover of first video sent；The second video cover is to determine submodel from N number of candidate envelope by cover It is determined in face；The cover determines that submodel is used according to N number of candidate cover and N number of candidate respective target of cover The convolutional neural networks model of family operation data progress intensified learning acquisition；Target user's operation data is used to indicate the target User's operation, and, the corresponding candidate cover of target user operation；Target user operation is each in targeted group The user's operation that user executes the first video, designated user's group are that the terminal corresponds to user group where user；

S64, terminal show the video playing entrance of first video according to the second video cover.

For example, referring to FIG. 7, it illustrates the invention relates to model training before and after same video video envelope Face changes schematic diagram.As shown in fig. 7, the first moment before model training completion, the video playing of the first video of terminal display Entrance 71, the video cover of the video playing entrance 71 are video cover 1, which is N number of candidate envelope of the first video Any cover in face；The second moment after the completion of model training, terminal display equally include the page of the first video, then service Device corresponds to the user group where user according to the terminal, is sealed by the corresponding model of the user group from N number of candidate of the first video Specified candidate cover is extracted in face, and specified candidate cover is pushed to terminal as video cover 2, and by video cover 2, such as Shown in Fig. 7, terminal shows the video playing entrance 71 of the first video in the page, at this point, the view of the video playing entrance 71 The variation of frequency cover is video cover 2.

Scheme shown in the application proposes that a kind of video cover based on intensified learning automatically generates and on-line selection method. Intensified learning is a kind of machine learning algorithm, emphasizes to make a choice based on current state, to obtain maximized prospective earnings.This Scheme can sound out candidate cover in video recommendations scene, calculate candidate envelope according to the click behavior of active user The forecast confidence in face, and according to the next exploratory behaviour of forecast confidence decision.

Referring to FIG. 8, it illustrates the invention relates to a kind of video cover based on intensified learning give birth to automatically At the general frame figure with on-line selection method, as shown in figure 8, the overall flow of technology side involved in the frame includes:

81) video candidate cover is excavated offline, and saves picture indices.

82) to each candidate cover, the characteristic vector pickup of picture is carried out using CNN convolutional neural networks, i.e., to every Candidate cover is indicated using 1 dimensional feature of CNN model extraction.

83) the candidate cover of video is soundd out on line, is added up in a period of time, the clicking rate of different candidate's covers and Playing duration data.Wherein, the probability that cover is soundd out is P={ P_i, wherein i represents the serial number of cover, also,

∑_iP_i=1.

84) the sum of the forecast confidence based on intensified learning on-line study candidate's cover.According to the point of different candidate covers Hit rate and playing duration data and reward function formula R=R_click+R_duration, actual degree of belief R is calculated, is further calculated The sum of the forecast confidence of candidate cover.

85) the confidence score convergence of candidate cover, selects the highest cover of confidence level to show cover as final.

By initial model be CNN model for, referring to FIG. 9, it illustrates the invention relates to a kind of model Training flow diagram.As shown in figure 9, the model training process can be such that

S901 reads the video data of video 1；

S902 carries out scene cut to video 1, extracts the key images frame of each scene in video 1；

S903 clusters the key images frame of each scene in video 1, obtains K cluster centre；

S904 extracts piece image frame from each cluster centre, obtains K candidate cover；

K candidate cover is inputted initial model by S905 respectively, obtains the full articulamentum output of layer second from the bottom, K time Select the respective characteristic of cover；

S906, within a designated length period, using K candidate cover as the cover of video 1, to each end Hold pushing video 1；

S907, the operation note according to the user in each terminal to video 1 generate the K respective user of candidate cover Operation data, user's operation data include clicking rate and playing duration etc.；

S908 obtains this by intensive training and refers to according to the respective characteristic of K candidate's cover and user's operation data Measured length period corresponding objective function；The objective function to calculate acquisition by user's operation data, K candidate envelope The sum of the confidence level in face maximizes；

S909, judges whether the sum of the confidence level of K candidate cover restrains；

S910, if the convergence of the sum of confidence level of K candidate cover, by the net of the full articulamentum of the last layer in initial model Network parameter is set as target component；Parameter matrix in the target component or objective function.

S911 calculates K candidate cover according to the target component if the sum of confidence level of K candidate cover is not converged Respective displaying probability；

S912, return step S906, within next designated length period, according to the K respective displaying of candidate cover Probability, using K candidate cover as the cover of video 1, to each terminal pushing video 1.

In conclusion scheme shown in the embodiment of the present application, with comprising feature extraction component and confidence level output precision Convolutional neural networks model is as initial model, and using the network parameter of confidence level output precision as training objective, by same User's operation data and feature extraction component when the corresponding K candidate cover of video is respectively as video cover wait K Select cover respectively treated output as a result, determining model by the method for intensified learning training acquisition video cover, on the one hand, Due to, by extracting the feature of candidate cover with machine learning model, not needing the special design drawing of developer in the application The evaluation index of picture, on the other hand, user's operation data when in conjunction with K candidate cover respectively as video cover are strong to carry out Chemistry is practised, and has been comprehensively considered user to the selection operation of video cover, has been improved the accuracy of the model trained, thus protecting While the picture frame for being suitble to generate video cover can accurately be determined from video by demonstrate,proving the model trained, additionally it is possible to improve The efficiency of model training and update.

In addition, scheme shown in the embodiment of the present application automatically generates the surface plot Candidate Set comprising multiple images frame, it is convenient for User's quickly positioning target video promotes video click rate.

In addition, scheme shown in the embodiment of the present application obtains multiple candidate covers as cover based on user's operation behavior Confidence level, the confidence level can reflect cover to the attraction degree of user and with the matching degree of video subject, and it is anti- It reflects in same video as the partial ordering relation between multiple candidate covers of cover.

In addition, scheme shown in the embodiment of the present application, by intensified learning learning process end to end, spy's early period is avoided Sign design and with extract work, be conducive to the cover for more being met user preference.

In addition, scheme shown in the embodiment of the present application, real-time according to user while souning out different surface plots and showing Click behavior adjusts Probe Strategy, reduces the accumulation demand to user's clicking rate data, accelerates model convergence rate.

In addition, in the embodiment of the present application, server can be pushed away using K candidate cover as video cover When sending, behavior adjustment push strategy is added with reducing the accumulation demand to user's operation data from reaching depending on the user's operation The effect of fast model convergence rate.

This programme is based on intensified learning, is automatically selected according to the click play behavior of user in video recommendation system and is most attracted The cover of user.The beneficial effect of the program includes, and under video recommendations scene, can expand the candidate of video cover displaying Collection, and designed without artificial labeled data and Feature Engineering, so that it may the cover for being most suitable for displaying is automatically selected, and then is improved The clicking rate and playing duration of video.

In above-mentioned each embodiment, only it is illustrated so that model training equipment is the server of video provider as an example, In the scheme of other examples, above-mentioned model training equipment is also possible to other equipment except server, for example, and server Connected management equipment, or, independent PC device etc., alternatively, above-mentioned model training equipment is also possible to cloud meter Calculation center etc..The application for model training equipment specific form without limitation.

By scheme shown in the above embodiments of the present application, the training of model and application method can apply for user from In the dynamic artificial intelligence (Artificial Intelligence, AI) for determining video cover, with suitable to each user push Video cover pushes the video cover of the possible preference of user alternatively, being directed to different users respectively.

Figure 10 is a kind of structural block diagram of video push device shown according to an exemplary embodiment.The image recognition Device can be used in computer equipment, the whole executed with executing in Fig. 2 or embodiment illustrated in fig. 5 by server or portion Step by step.The video push device may include:

Candidate cover obtains module 1001, and for obtaining N number of candidate cover in the first video, N is more than or equal to 2 Integer；

Confidence level prediction module 1002, for determining that model respectively carries out N number of candidate cover by video cover Processing, obtains the respective forecast confidence of N number of candidate cover, and the forecast confidence is used to indicate corresponding candidate cover It is the probability of video cover；The video cover determines that model is according to the K candidate cover and the K in the second video The respective user's operation data of a candidate's cover carry out the convolutional neural networks model of intensified learning acquisition；The user's operation number The user's operation arrived according to second video reception is used to indicate, and, the corresponding candidate cover of the user's operation, K is big In or equal to 2 integer；

Video cover obtains module 1003, is used for according to the respective forecast confidence of N number of candidate cover, from the N The video cover of first video is obtained in a candidate's cover；

Video push module 1004, for sealing terminaloriented push first view according to the video of first video Frequently.

The confidence level prediction module 1002, is used for,

It inquires the terminal and corresponds to targeted group where user；

Optionally, the candidate cover obtains module 1001, is used for,

Obtain each key images frame in first video；

Optionally, it is extracting at least one key images frame respectively from least two cluster centre, is obtaining the N When a candidate's cover, candidate's cover obtains module 1001, is used for,

Figure 11 is shown according to an exemplary embodiment a kind of for determining the training device of the model of video cover Structural block diagram.The device can be used in computer equipment, to execute in Fig. 3 or embodiment illustrated in fig. 5, be executed by server All or part of step.The apparatus may include:

Candidate cover obtains module 1101, and for obtaining K in the second video candidate cover, K is more than or equal to 2 Integer；

Characteristic extracting module 1102, for passing through K described in the convolutional neural networks model extraction respective figure of candidate cover As feature；Described image is characterized in the output of the feature extraction component in the convolutional neural networks；

Operation data obtains module 1103, for respectively using described K candidate cover as the video of second video Cover pushes second video, obtains the described K respective user's operation data of candidate cover；The user behaviour It is used to indicate the user's operation that second video reception arrives as data, and, the corresponding candidate cover of the user's operation；

Intensified learning module 1104, for according to the described K candidate respective characteristics of image of cover and the K time Select the respective user's operation data of cover, to the network parameter of the confidence level output precision in the convolutional neural networks model into Row intensified learning；The characteristics of image output prediction that the confidence level output precision is used to be extracted according to the feature extraction component is set Reliability, the forecast confidence are used to indicate the probability that corresponding candidate cover is video cover；

Model obtain module 1105, for when the confidence level output precision output result convergence when, by the convolution Neural network model is retrieved as determining that the video cover of video cover determines model.

Optionally, described device further include:

Forecast confidence obtains module, for obtaining the confidence level output precision before model obtains module 1105 Output, the described K respective forecast confidence of candidate cover；

Optionally, the intensified learning module 1104, is used for,

Optionally, the operation data obtains module 1103, is used for,

Optionally, described device further include: grouping module refers to for obtaining the acquisition of module 1103 in the operation data Before determining at least one user in user group to the user operation records of second video, according to each user to each view The user operation records of frequency are grouped each user, obtain at least one user group, at least one described user group In include designated user's group.

Optionally, described device further include:

The operation data obtains module 1103, is also used to obtain in next designated length period, and the K The candidate respective new user's operation data of cover；

The intensified learning module 1104 is also used to according to the respective characteristics of image of described K candidate cover and described The K respective new user's operation data of candidate cover carry out intensified learning to the network parameter of the confidence level output precision.

Figure 12 is the structural block diagram that a kind of video cover shown according to an exemplary embodiment shows device.The video Cover shows that device can be used in computer equipment, to execute in Fig. 2, Fig. 3 or embodiment illustrated in fig. 5, is executed by terminal All or part of step.The video cover shows that device may include:

First receiving module 1201, for receiving the first video envelope of the first video of server push at the first moment Face, the first video cover are any covers in N number of candidate cover, and N is the integer more than or equal to 2；

First display module 1202, for showing that the video playing of first video enters according to the first video cover Mouthful；

Second receiving module 1203, at the second moment, receiving the of first video of the server push Two video covers；The second video cover is to determine that submodel is determined from N number of candidate cover by cover；It is described Cover determines that submodel is according to N number of candidate cover and the respective target user's operation data of N number of candidate cover Carry out the convolutional neural networks model of intensified learning acquisition；Target user's operation data is used to indicate first video and connects The target user's operation received, and, the target user operates corresponding candidate cover；Target user's operation is target The user's operation that each user in user group executes first video, designated user's group are the terminals to application User group where family；

Second display module 1204, for showing that the video playing of first video enters according to the second video cover Mouthful.

Figure 13 is a kind of structural schematic diagram of computer equipment shown according to an exemplary embodiment.The computer is set Standby 1300 include central processing unit (CPU) 1301 including random access memory (RAM) 1302 and read-only memory (ROM) 1303 system storage 1304, and the system bus 1305 of connection system storage 1304 and central processing unit 1301. The computer equipment 1300 further includes the basic input/output that information is transmitted between each device helped in computer (I/O system) 1306, and large capacity for storage program area 1313, application program 1314 and other program modules 1315 are deposited Store up equipment 1307.

The basic input/output 1306 includes display 1308 for showing information and inputs for user The input equipment 1309 of such as mouse, keyboard etc of information.Wherein the display 1308 and input equipment 1309 all pass through The input and output controller 1310 for being connected to system bus 1305 is connected to central processing unit 1301.The basic input/defeated System 1306 can also include input and output controller 1310 to touch for receiving and handling from keyboard, mouse or electronics out Control the input of multiple other equipment such as pen.Similarly, input and output controller 1310 also provide output to display screen, printer or Other kinds of output equipment.

The mass-memory unit 1307 (is not shown by being connected to the bulk memory controller of system bus 1305 It is connected to central processing unit 1301 out).The mass-memory unit 1307 and its associated computer-readable medium are Computer equipment 1300 provides non-volatile memories.That is, the mass-memory unit 1307 may include such as hard The computer-readable medium (not shown) of disk or CD-ROM drive etc.

Without loss of generality, the computer-readable medium may include computer storage media and communication media.Computer Storage medium includes information such as computer readable instructions, data structure, program module or other data for storage The volatile and non-volatile of any method or technique realization, removable and irremovable medium.Computer storage medium includes RAM, ROM, EPROM, EEPROM, flash memory or other solid-state storages its technologies, CD-ROM, DVD or other optical storages, tape Box, tape, disk storage or other magnetic storage devices.Certainly, skilled person will appreciate that the computer storage medium It is not limited to above-mentioned several.Above-mentioned system storage 1304 and mass-memory unit 1307 may be collectively referred to as memory.

Computer equipment 1300 can be connected by the Network Interface Unit 1311 being connected on the system bus 1305 To internet or other network equipments.

The memory further includes that one or more than one program, the one or more programs are stored in In memory, central processing unit 1301 realizes Fig. 2, Fig. 3 or side shown in fig. 5 by executing one or more programs In method, by all or part of step of server execution.

Figure 14 shows the structural block diagram of the terminal 1400 of one exemplary embodiment of the application offer.The terminal 1400 can To be: smart phone, tablet computer, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level 4) player, laptop or desktop computer.Terminal 1400 is also Other titles such as user equipment, portable terminal, laptop terminal, terminal console may be referred to as.

In general, terminal 1400 includes: processor 1401 and memory 1402.

Processor 1401 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place Reason device 1401 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field- Programmable Gate Array, field programmable gate array), (Programmable Logic Array, can compile PLA Journey logic array) at least one of example, in hardware realize.Processor 1401 also may include primary processor and coprocessor, Primary processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing Unit, central processing unit)；Coprocessor is the low power processor for being handled data in the standby state.? In some embodiments, processor 1401 can be integrated with GPU (Graphics Processing Unit, image processor), GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 1401 can be with Including AI (Artificial Intelligence, artificial intelligence) processor, the AI processor is for handling related machine learning Calculating operation.

Memory 1402 may include one or more computer readable storage mediums, which can To be non-transient.Memory 1402 may also include high-speed random access memory and nonvolatile memory, such as one Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 1402 can Storage medium is read for storing at least one instruction, at least one instruction is above-mentioned for realizing performed by processor 1401 In Fig. 2, Fig. 3 or embodiment of the method shown in fig. 5, all or part of step that is executed by terminal.

In some embodiments, terminal 1400 is also optional includes: peripheral device interface 1403 and at least one periphery are set It is standby.It can be connected by bus or signal wire between processor 1401, memory 1402 and peripheral device interface 1403.It is each outer Peripheral equipment can be connected by bus, signal wire or circuit board with peripheral device interface 1403.Specifically, peripheral equipment includes: In radio circuit 1404, touch display screen 1405, camera 1406, voicefrequency circuit 1407, positioning component 1408 and power supply 1409 At least one.

Peripheral device interface 1403 can be used for I/O (Input/Output, input/output) is relevant outside at least one Peripheral equipment is connected to processor 1401 and memory 1402.In some embodiments, processor 1401, memory 1402 and periphery Equipment interface 1403 is integrated on same chip or circuit board；In some other embodiments, processor 1401, memory 1402 and peripheral device interface 1403 in any one or two can be realized on individual chip or circuit board, this implementation Example is not limited this.

Radio circuit 1404 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal. Radio circuit 1404 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 1404 is by telecommunications Number being converted to electromagnetic signal is sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 1404 include: antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, volume solution Code chipset, user identity module card etc..Radio circuit 1404 can by least one wireless communication protocol come with it is other Terminal is communicated.The wireless communication protocol includes but is not limited to: WWW, Metropolitan Area Network (MAN), Intranet, each third generation mobile communication network (2G, 3G, 4G and 5G), WLAN and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some realities It applies in example, radio circuit 1404 can also include that NFC (Near Field Communication, wireless near field communication) is related Circuit, the application is not limited this.

Display screen 1405 is for showing UI (UserInterface, user interface).The UI may include figure, text, figure Mark, video and its their any combination.When display screen 1405 is touch display screen, display screen 1405 also has acquisition aobvious The ability of the touch signal on the surface or surface of display screen 1405.The touch signal can be used as control signal and be input to processing Device 1401 is handled.At this point, display screen 1405 can be also used for providing virtual push button and/or dummy keyboard, also referred to as soft button And/or soft keyboard.In some embodiments, display screen 1405 can be one, and the front panel of terminal 1400 is arranged；At other In embodiment, display screen 1405 can be at least two, be separately positioned on the different surfaces of terminal 1400 or in foldover design；? In still other embodiments, display screen 1405 can be flexible display screen, be arranged on the curved surface of terminal 1400 or fold plane On.Even, display screen 1405 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display screen 1405 can be adopted With LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) etc. materials preparation.

CCD camera assembly 1406 is for acquiring image or video.Optionally, CCD camera assembly 1406 includes front camera And rear camera.In general, the front panel of terminal is arranged in front camera, the back side of terminal is arranged in rear camera.? In some embodiments, rear camera at least two is that main camera, depth of field camera, wide-angle camera, focal length are taken the photograph respectively As any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide Pan-shot and VR (Virtual Reality, virtual reality) shooting function or other fusions are realized in camera fusion in angle Shooting function.In some embodiments, CCD camera assembly 1406 can also include flash lamp.Flash lamp can be monochromatic temperature flash of light Lamp is also possible to double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for Light compensation under different-colour.

Voicefrequency circuit 1407 may include microphone and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and It converts sound waves into electric signal and is input to processor 1401 and handled, or be input to radio circuit 1404 to realize voice Communication.For stereo acquisition or the purpose of noise reduction, microphone can be separately positioned on the different portions of terminal 1400 to be multiple Position.Microphone can also be array microphone or omnidirectional's acquisition type microphone.Loudspeaker be then used for will from processor 1401 or The electric signal of radio circuit 1404 is converted to sound wave.Loudspeaker can be traditional wafer speaker, be also possible to piezoelectric ceramics Loudspeaker.When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, it can also To convert electrical signals to the sound wave that the mankind do not hear to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 1407 It can also include earphone jack.

Positioning component 1408 is used for the current geographic position of positioning terminal 1400, to realize navigation or LBS (Location Based Service, location based service).Positioning component 1408 can be the GPS (Global based on the U.S. Positioning System, global positioning system), China dipper system or Russia Galileo system positioning group Part.

Power supply 1409 is used to be powered for the various components in terminal 1400.Power supply 1409 can be alternating current, direct current Electricity, disposable battery or rechargeable battery.When power supply 1409 includes rechargeable battery, which can be line charge Battery or wireless charging battery.Wired charging battery is the battery to be charged by Wireline, and wireless charging battery is to pass through The battery of wireless coil charging.The rechargeable battery can be also used for supporting fast charge technology.

In some embodiments, terminal 1400 further includes having one or more sensors 1410.One or more sensing Device 1410 includes but is not limited to: acceleration transducer 1411, gyro sensor 1412, pressure sensor 1413, fingerprint sensing Device 1414, optical sensor 1415 and proximity sensor 1416.

Acceleration transducer 1411 can detecte the acceleration in three reference axis of the coordinate system established with terminal 1400 Size.For example, acceleration transducer 1411 can be used for detecting component of the acceleration of gravity in three reference axis.Processor The 1401 acceleration of gravity signals that can be acquired according to acceleration transducer 1411, control touch display screen 1405 with transverse views Or longitudinal view carries out the display of user interface.Acceleration transducer 1411 can be also used for game or the exercise data of user Acquisition.

Gyro sensor 1412 can detecte body direction and the rotational angle of terminal 1400, gyro sensor 1412 Acquisition user can be cooperateed with to act the 3D of terminal 1400 with acceleration transducer 1411.Processor 1401 is according to gyro sensors The data that device 1412 acquires, may be implemented following function: action induction (for example UI is changed according to the tilt operation of user), Image stabilization, game control and inertial navigation when shooting.

The lower layer of side frame and/or touch display screen 1405 in terminal 1400 can be set in pressure sensor 1413.When When the side frame of terminal 1400 is arranged in pressure sensor 1413, user can detecte to the gripping signal of terminal 1400, by Reason device 1401 carries out right-hand man's identification or prompt operation according to the gripping signal that pressure sensor 1413 acquires.Work as pressure sensor 1413 when being arranged in the lower layer of touch display screen 1405, is grasped by processor 1401 according to pressure of the user to touch display screen 1405 Make, realization controls the operability control on the interface UI.Operability control include button control, scroll bar control, At least one of icon control, menu control.

Fingerprint sensor 1414 is used to acquire the fingerprint of user, is acquired by processor 1401 according to fingerprint sensor 1414 The identity of the fingerprint recognition user arrived, alternatively, by fingerprint sensor 1414 according to the identity of collected fingerprint recognition user.? When the identity for identifying user is trusted identity, the user is authorized to execute relevant sensitive operation, the sensitivity by processor 1401 Operation includes solving lock screen, checking encryption information, downloading software, payment and change setting etc..Fingerprint sensor 1414 can be by The front, the back side or side of terminal 1400 are set.When being provided with physical button or manufacturer Logo in terminal 1400, fingerprint sensing Device 1414 can be integrated with physical button or manufacturer Logo.

Optical sensor 1415 is for acquiring ambient light intensity.In one embodiment, processor 1401 can be according to light The ambient light intensity that sensor 1415 acquires is learned, the display brightness of touch display screen 1405 is controlled.Specifically, work as ambient light intensity When higher, the display brightness of touch display screen 1405 is turned up；When ambient light intensity is lower, the aobvious of touch display screen 1405 is turned down Show brightness.In another embodiment, the ambient light intensity that processor 1401 can also be acquired according to optical sensor 1415, Dynamic adjusts the acquisition parameters of CCD camera assembly 1406.

Proximity sensor 1416, also referred to as range sensor are generally arranged at the front panel of terminal 1400.Proximity sensor 1416 for acquiring the distance between the front of user Yu terminal 1400.In one embodiment, when proximity sensor 1416 is examined When measuring the distance between the front of user and terminal 1400 and gradually becoming smaller, by processor 1401 control touch display screen 1405 from Bright screen state is switched to breath screen state；When proximity sensor 1416 detects the distance between the front of user Yu terminal 1400 When becoming larger, touch display screen 1405 is controlled by processor 1401 and is switched to bright screen state from breath screen state.

It, can be with it will be understood by those skilled in the art that the restriction of the not structure paired terminal 1400 of structure shown in Figure 14 Including than illustrating more or fewer components, perhaps combining certain components or being arranged using different components.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided It such as include the memory of computer program (instruction), above procedure (instruction) can be executed by the processor of computer equipment to complete The all or part of step of method shown in each embodiment of the application.For example, the computer-readable storage of non-transitory Medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and optical data storage devices etc..

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the application Its embodiment.This application is intended to cover any variations, uses, or adaptations of the application, these modifications, purposes or Person's adaptive change follows the general principle of the application and including the undocumented common knowledge in the art of the application Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the application are by following Claim is pointed out.

It should be understood that the application is not limited to the precise structure that has been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.Scope of the present application is only limited by the accompanying claims.

Claims

1. a kind of video pushing method, which is characterized in that the described method includes:

It determines that model is respectively handled N number of candidate cover by video cover, obtains N number of candidate cover respectively Forecast confidence, the forecast confidence is used to indicate the probability that corresponding candidate cover is video cover；The video envelope Face determines that model is according to the K candidate cover and the respective user's operation data of the K candidate cover in the second video Carry out the convolutional neural networks model of intensified learning acquisition；The user's operation data are used to indicate second video reception and arrive User's operation, and, the corresponding candidate cover of the user's operation, K is the integer more than or equal to 2；

According to the respective forecast confidence of N number of candidate cover, first video is obtained from N number of candidate cover Video cover；

2. the method according to claim 1, wherein the video cover determines that model includes at least two covers Determine submodel；And at least two cover determines that submodel respectively corresponds respective user group；

It is described to determine that model is respectively handled N number of picture frame by video cover, it is each to obtain N number of candidate cover From forecast confidence, comprising:

It inquires the terminal and corresponds to targeted group where user；

It obtains the corresponding cover of the targeted group and determines submodel, the corresponding cover of the targeted group determines submodel It is to be carried out by force according to K in the second video candidate cover and the K candidate respective target user's operation data of cover Chemistry practises the convolutional neural networks model obtained；Target user's operation data is used to indicate target user's operation, and, institute State the corresponding candidate cover of target user's operation；Target user operation is each user in the targeted group to institute State the user's operation of the second video execution；

It determines that submodel is respectively handled N number of candidate cover by the corresponding cover of the targeted group, obtains The respective forecast confidence of N number of candidate cover.

3. the method according to claim 1, wherein the N number of candidate cover obtained in the first video, packet It includes:

Obtain each key images frame in first video；

Clustering processing is carried out to each key images frame, obtains at least two cluster centres, each cluster centre packet At least one key images frame containing corresponding Same Scene type；

It extracts at least one key images frame respectively from least two cluster centre, obtains N number of candidate cover.

4. according to the method described in claim 3, it is characterized in that, described extract respectively from least two cluster centre At least one key images frame obtains N number of candidate cover, comprising:

By at least two cluster centre, the cluster centre that the quantity for the key images frame for including is less than amount threshold is picked It removes, obtains N number of cluster centre；

5. method according to any one of claims 1 to 4, which is characterized in that the video cover determines that model includes feature Extraction assembly and confidence level output precision；

The confidence level output precision is used to export the time of the input according to the characteristics of image that the feature extraction component extracts Select the forecast confidence of cover.

6. a kind of for determining the training method of the model of video cover, which is characterized in that the described method includes:

Pass through K described in the convolutional neural networks model extraction respective characteristics of image of candidate cover；Described image is characterized in described The output of feature extraction component in convolutional neural networks；

Respectively using described K candidate cover as the video cover of second video, second video is pushed, is obtained Obtain the described K respective user's operation data of candidate cover；The user's operation data are used to indicate second video reception The user's operation arrived, and, the corresponding candidate cover of the user's operation；

According to the described K candidate respective characteristics of image of cover and the respective user's operation data of K candidate's cover, Intensified learning is carried out to the network parameter of the confidence level output precision in the convolutional neural networks model；The confidence level output The characteristics of image that component is used to be extracted according to the feature extraction component exports forecast confidence, and the forecast confidence is for referring to Show that corresponding candidate cover is the probability of video cover；

When the convergence of the output result of the confidence level output precision, the convolutional neural networks model is retrieved as being used to determine The video cover of video cover determines model.

7. according to the method described in claim 6, it is characterized in that, described special according to the described K candidate respective image of cover Sign and the respective user's operation data of the K candidate cover, export the confidence level in the convolutional neural networks model The network parameter of component carries out intensified learning, comprising:

According to the described K respective user's operation data of candidate cover, the described K respective practical confidence of candidate cover is obtained Degree；

According to the described K respective actual degree of belief acquisition strategy function of candidate cover, the strategic function is so that according to institute Stating the sum of the confidence level that the K candidate respective characteristics of image of cover obtains maximized function, the sum of described confidence level is the K The sum of respective forecast confidence of a candidate's cover；The matrix format of variable element in the strategic function and the confidence level The matrix format of the network parameter of output precision is identical；

8. according to the method described in claim 6, it is characterized in that,

It is described respectively using described K candidate cover as the video cover of second video, second video is pushed away It send, obtains the described K respective user's operation data of candidate cover, comprising:

Respectively using described K candidate cover as the video cover of second video, second video is pushed；

At least one user in designated user's group is obtained to the user operation records of second video, the user's operation note The corresponding respective candidate cover of record；

According at least one described user to the user operation records of second video, obtains the designated user and organize correspondence , the described K respective user's operation data of candidate cover；

When the output result convergence when the confidence level output precision, the convolutional neural networks model is retrieved as being used for Determine that the video cover of video cover determines model, comprising:

When the convergence of the output result of the confidence level output precision, the convolutional neural networks model is retrieved as and the finger Determine the corresponding cover of user group and determines submodel.

9. according to the method described in claim 8, it is characterized in that, described at least one user couple obtained in designated user's group Before the user operation records of second video, the method also includes:

According to each user to the user operation records of each video, each user is grouped, at least one is obtained User group includes designated user's group at least one described user group.

10. according to the method described in claim 6, it is characterized in that, the method also includes:

When the output result of the confidence level output precision is not converged, obtained according to the output result of the reliability output precision Displaying probability in described K candidate cover each leisure next designated length period；

According to the displaying probability in described K candidate cover each leisure next designated length period, respectively with the K Video cover of a candidate's cover as second video, pushes second video to each terminal；

It obtains in next designated length period, the described K respective new user's operation data of candidate cover；

According to the described K candidate respective characteristics of image of cover and the respective new user's operation number of the K candidate cover According to the network parameter progress intensified learning of the confidence level output precision.

11. a kind of video cover methods of exhibiting, which is characterized in that in terminal, which comprises

At the first moment, the first video cover of the first video of server push is received, the first video cover is N number of time Any cover in cover is selected, N is the integer more than or equal to 2；

At the second moment, the second video cover of first video of the server push is received；The second video envelope Face is to determine that submodel is determined from N number of candidate cover by cover；The cover determines that submodel is according to the N A candidate's cover and N number of candidate respective target user's operation data of cover carry out the convolution mind of intensified learning acquisition Through network model；Target user's operation data is used to indicate target user's operation that first video reception arrives, and, The target user operates corresponding candidate cover；Target user operation is each user in targeted group to described The user's operation that first video executes, designated user's group are that the terminal corresponds to user group where user；

12. a kind of video push device, which is characterized in that described device includes:

Candidate cover obtains module, and for obtaining N number of candidate cover in the first video, N is the integer more than or equal to 2；

Confidence level prediction module is obtained for determining that model is respectively handled N number of candidate cover by video cover The respective forecast confidence of N number of candidate cover, it is video envelope that the forecast confidence, which is used to indicate corresponding candidate cover, The probability in face；The video cover determines that model candidate is sealed according to K in the second video candidate cover and described K The respective user's operation data in face carry out the convolutional neural networks model of intensified learning acquisition；The user's operation data are for referring to Show the user's operation that second video reception arrives, and, the corresponding candidate cover of the user's operation, K is to be greater than or wait In 2 integer；

Video cover obtains module, for being sealed from N number of candidate according to the respective forecast confidence of N number of candidate cover The video cover of first video is obtained in face；

13. a kind of video cover shows device, which is characterized in that in terminal, described device to include:

First receiving module, at the first moment, receiving the first video cover of the first video of server push, described the One video cover is any cover in N number of candidate cover, and N is the integer more than or equal to 2；

Second receiving module, for receiving the second video envelope of first video of the server push at the second moment Face；The second video cover is to determine that submodel is determined from N number of candidate cover by cover；The cover determines Submodel is strengthened according to N number of candidate cover and N number of candidate respective target user's operation data of cover Learn the convolutional neural networks model obtained；Target user's operation data is used to indicate the mesh that first video reception arrives User's operation is marked, and, the target user operates corresponding candidate cover；Target user's operation is in targeted group Each user user's operation that first video is executed, designated user's group is where the terminal corresponds to user User group；

14. a kind of computer equipment, which is characterized in that the computer equipment includes processor and memory, the memory In be stored at least one instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, described at least one Duan Chengxu, the code set or instruction set are loaded by the processor and are executed to realize as described in claim 1 to 11 is any Method.

15. a kind of computer readable storage medium, which is characterized in that be stored at least one instruction, extremely in the storage medium A few Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, the code set or instruction Collection is loaded by processor and is executed to realize the method as described in claim 1 to 11 is any.