CN106503693B

CN106503693B - The providing method and device of video cover

Info

Publication number: CN106503693B
Application number: CN201611059438.2A
Authority: CN
Inventors: 赵彦宾; 姜东�; 洪定坤; 夏绪宏
Original assignee: Beijing ByteDance Technology Co Ltd
Current assignee: Beijing Douyin Information Service Co Ltd
Priority date: 2016-11-28
Filing date: 2016-11-28
Publication date: 2019-03-15
Anticipated expiration: 2036-11-28
Also published as: CN106503693A

Abstract

The embodiment of the present application discloses the providing method and device of video cover, method includes: the video file for receiving user and uploading, and determines scene change key frame according to the situation of change of content frame adjacent in video file and intercepts to the corresponding picture of the scene change key frame；It is that the picture being truncated to is given a mark and sorted by the preparatory trained machine learning model for picture classification；The high preset width number picture of score is supplied to user as the candidate picture of video cover according to sequence, so that user carries out the selection of video cover from the candidate picture.With this, it not only can guarantee and do not omit all important scenes in video file, but also the picture multiplicity in video cover candidate's picture of offer can be provided, promote the quality of candidate picture, the user that is more convenient for therefrom chooses to the video cover being more suitable for.

Description

The providing method and device of video cover

Technical field

This application involves field of computer technology, more particularly to the providing method and device of video cover.

Background technique

When we see video in video website, it can see that each video has a video cover in related web page, The quality height that video cover corresponds to picture is an important factor for attracting user to click video, particularly with instantly more fiery short For video, the quality that video cover corresponds to picture is particularly important.

The Choice of existing video cover, usually according to set time point (for example, a video is averaged by duration It is divided into several pieces sub-video, the time point that every one's share of expenses for a joint undertaking video is started to play is as set time point, etc.), from the video The picture for the video cover that middle interception picture is therefrom selected as the candidate picture of video cover, but so obtained for user The problems such as often will appear fuzzy, defocus or picture it is too simple, without containing significant object or object etc..

With depth machine learning techniques fast development and depth machine learning techniques image and voice identification The huge progress that processing aspect obtains, in order to solve the problems in above-mentioned selecting video cover scheme, the base that YouTube is proposed Scheme is automatically generated in the video thumbnails of depth machine learning techniques, deep neural network (DNN, Deep Neural can be used Network), the picture as video cover user uploaded, will be random from video file as " high quality " training set The picture of interception is used as " low quality " training set, then " high quality " training set described in preparatory use and " low quality " training set into The training of machine learning model of the row based on DNN, to obtain trained DNN machine learning model.It is generated in video thumbnails In the process, it can first intercept picture (for example, one second one frame of interception) at random from video file, then be trained in advance using above-mentioned DNN machine learning model give a mark to the picture being truncated to, then the choosing from the picture (may be several width) of highest scoring Take a best width picture as video cover.By manual evaluation, namely DNN machine learning mould is compared by evaluator The video cover and video cover caused by the scheme of picture is intercepted according to set time point that type generates, 65% people thinks The picture for the video cover that DNN machine learning model generates is more preferable.

But such scheme can also have the following disadvantages:

Firstly, directly using the picture of user's upload as " high quality " training set, it will be from video according to set time point The picture of interception is used as " low quality " training set, can be introduced into a large amount of " dirty data " that is, can in the picture that user uploads Can there can be the picture of many poor qualities, may also can exist from the picture intercepted in video according to set time point very much The pretty good picture of quality, therefore, this training set comprising " dirty data " can directly result in the machine learning model trained and reach Less than good classifying quality；

Secondly, when video file duration is longer, such screenshot mode can make the multiplicity for the picture being truncated to compare Height, the video cover picture for being finally provided to user is likely to be the relatively high picture of some multiplicities.

Summary of the invention

This application provides the providing method of video cover and devices, not only can guarantee and do not omit owning in video file Important scenes, and the picture multiplicity in video cover candidate's picture of offer can be provided, the quality of candidate picture is promoted, more just It therefrom chooses in user to the video cover being more suitable for.

This application provides following schemes:

A kind of providing method of video cover, comprising:

The video file that user uploads is received, scene change is determined according to the situation of change of content frame adjacent in video file Key frame simultaneously intercepts the corresponding picture of the scene change key frame；

It is that the picture being truncated to is given a mark side by side by the preparatory trained machine learning model for picture classification Sequence；

The high preset width number picture of score is supplied to user as the candidate picture of video cover according to sequence, so as to Family carries out the selection of video cover from the candidate picture.

Optionally, further includes:

User is received to the selection instruction of any picture in the candidate picture；

The picture that user selects is determined as video cover.

Optionally, scene change key frame determined according to the situation of change of content frame adjacent in video file and to the field The corresponding picture of scape transformation key frame is intercepted, comprising:

Judge that whether adjacent two content frames variation is beyond preset change threshold in video file；

The frame that will exceed preset change threshold is determined as scene change key frame；

The corresponding picture of scene change key frame is intercepted, and the picture being truncated to is formed into scene change key frame Picture set.

Optionally, to the training of the machine learning model for picture classification, comprising:

Determine the image data for machine learning model training；

The image data is done into repetitive exercise in the machine learning model of convolutional neural networks CNN, and is instructed in iteration The weight of convolutional neural networks is adjusted during practicing, to obtain on the basis of CNN machine learning model for picture classification CNN machine learning model；

The CNN machine learning model for picture classification is assessed；

If assessment passes through, training terminates and using the CNN machine learning model for being used for picture classification as training The CNN machine learning model for picture classification.

Optionally, further includes:

If assessment does not pass through, the parameter of algorithm is used to adjust in the CNN machine learning model for picture classification It is whole, so that the image data is continued to do iteration in the parameter CNN machine learning model adjusted for being used for picture classification It trains, and adjusts the weight of convolutional neural networks during repetitive exercise, until the obtained CNN machine for picture classification Learning model assessment passes through.

Optionally, the determination is used for the image data of machine learning model training, comprising:

Obtain basic image data collection；

Obtain the color character parameter value that basic image data concentrates picture；

The picture for not meeting prerequisite is concentrated to remove basic image data according to the color character parameter value, to obtain It must be used for the image data of machine learning model training.

Optionally, the basic image data collection includes: the first data set containing user's uploading pictures and contains by pre- Set the second data set of the picture that time interval intercepts at random；

The color character parameter value includes tone value, intensity value and brightness value；

The picture for not meeting prerequisite is concentrated to remove basic image data according to the color character parameter value, to obtain It must be used for the image data of machine learning model training, comprising:

According to preset color character weight, weighted sum calculating is done to the color character parameter value of every width picture, to obtain The corresponding color character numerical value of every width picture；

Color character numerical value in first data set is lower than to the picture and second data set of the first preset score value The picture that middle color character numerical value is higher than the second preset score value is removed, and obtains first kind data set and Second Type respectively Data set, using as machine learning model training image data.

The color character parameter value includes tone value, intensity value and rgb value；

By tone value in first data set lower than in the picture and second data set of the first preset hue threshold The picture that tone value is higher than the second preset hue threshold is removed；

Picture and second data by intensity value in first data set lower than the first preset saturation degree threshold value The picture for concentrating intensity value to be higher than the second preset saturation degree threshold value is removed；

The black and white picture in first data set is removed according to the rgb value；

By the picture remained in the first data set and the second data set be identified as first kind data set and Second Type data set, using as machine learning model training image data.

Optionally, by color character numerical value in first data set lower than the picture of the first preset score value and described the After the picture that color character numerical value is higher than the second preset score value in two data sets is removed, further includes:

The similarity between picture remaining in the first data set and the second data set is judged respectively, and according to judgement As a result a width picture is chosen from the picture that similarity reaches preset similarity threshold to be retained, so as to by the first data set and The picture remained in second data set is respectively as the first kind data set and Second Type data set.

A kind of offer device of video cover, comprising:

Screenshot unit, for receiving the video file of user's upload, and according to the variation of content frame adjacent in video file Situation determines scene change key frame and intercepts to the corresponding picture of the scene change key frame；

Marking unit, for being the picture being truncated to by the preparatory trained machine learning model for picture classification It is given a mark and is sorted；

Candidate picture provides unit, for according to sequence using the high preset width number picture of score as the candidate of video cover Picture is supplied to user, so that user carries out the selection of video cover from the candidate picture.

Optionally, further includes:

Instruction receiving unit, for receiving user to the selection instruction of any picture in the candidate picture；

Video cover determination unit, the picture for selecting user are determined as video cover.

Optionally, the screenshot unit, is specifically used for:

According to specific embodiment provided by the present application, this application discloses following technical effects:

It can be according to consecutive frame in video file after receiving the video file of user's upload by the embodiment of the present application The situation of change of content determines scene change frame, and intercepts to the corresponding picture of the scene change frame, then can pass through The preparatory trained machine learning model for picture classification is that the picture being truncated to is given a mark and sorted, further according to sequence The high preset width number picture of score is supplied to user as the candidate picture of video cover, so that user is from the candidate picture The middle selection for carrying out video cover.With this, it not only can guarantee all important scenes that do not omit in video file, but also offer can be reduced Video cover candidate's picture in picture multiplicity, promote the quality of candidate picture, the user that is more convenient for, which therefrom chooses, to arrive more Suitable video cover.

Certainly, any product for implementing the application does not necessarily require achieving all the advantages described above at the same time.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the application Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings Obtain other attached drawings.

Fig. 1 is method flow diagram provided by the embodiments of the present application；

Fig. 2 is the training process in method provided by the embodiments of the present application to the machine learning model for picture classification Figure；

Fig. 3-1 to Fig. 3-3 is the experimental data schematic diagram in method provided by the embodiments of the present application；

Fig. 4 is schematic device provided by the embodiments of the present application.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on Embodiment in the application, those of ordinary skill in the art's every other embodiment obtained belong to the application protection Range.

Referring to Fig. 1, the embodiment of the present application provides firstly a kind of providing method of video cover, may include following step It is rapid:

S101 receives the video file that user uploads, and is determined according to the situation of change of content frame adjacent in video file Scene change key frame simultaneously intercepts the corresponding picture of the scene change key frame.

Under normal conditions, video website is not only able to for video file built-in in its corresponding server to be supplied to user's sight It sees, user's viewing can also be supplied to after the video file for receiving any user upload.In the present embodiment, when receiving After the video file that user uploads, it can first determine the scene change situation in video file (it can be appreciated that camera lens occurs The case where switching), it in the present embodiment, for example can get the content change situation of consecutive frame in video file, judge adjacent two Whether content frame variation exceeds preset change threshold, and the frame beyond preset change threshold is determined as scene change key Frame, all figures that then the determining corresponding picture of scene change key frame can be intercepted, and will can be further truncated to Piece forms scene change key frame picture set, to use in the next steps, guarantees not omit video file midfield with this Scene (being also believed to important scenes) when scape converts, and the multiplicity for being truncated to picture can be reduced.

In practical applications, can also be changed by the code rate in video file to judge the scene change feelings in video file Then condition carries out the interception of picture according to scene change situation from video file, corresponding screenshot when obtaining scene change, By such screenshot mode, can guarantee not omit all important scenes in video file as far as possible, and can reduce and be truncated to figure The multiplicity of piece.

In addition it is also possible to determine the scene change situation in video file using other modes, such as we can be with Pass through the grey level histogram feature of picture, Scale invariant features transform (SIFT, Scale-Invariant Feature Transform) feature etc. judges the similarity of picture in video file, for example, can first intercept according to preset frequency Picture (for example, 2 seconds frames etc.), then according to existing about judging the technology of picture similarity between the picture being truncated to Similarity is judged, is only protected according to picture of the judging result to similarity high (for example similarity reaches preset similarity value) One is stayed, also can reach the purpose of the scene changes situation in determining video file in this way.

S102 is that the picture being truncated to is given a mark by the preparatory trained machine learning model for picture classification And it sorts.

Referring to shown in Fig. 2, in the present embodiment, to the training process of the machine learning model for picture classification, can wrap Include following steps:

Step 1, the image data for machine learning model training is determined.

In the present embodiment, it can be used depth machine learning model (different from unsupervised learning including supervised learning Learning framework under the learning model established it is different), such as can be used deep learning convolutional neural networks (CNN, Convolutional Neural Network), a kind of machine learning model of the supervised learning of depth, certainly, according to reality Border needs can also be used other suitable depth machine learning models.

Under normal conditions, the data for machine learning model training can be divided into three parts: training dataset (training data), test data set (testing data), validation data set (validation data), this three parts The ratio of data may be configured as 80%, 10%, 10%.For the machine learning model of supervised learning, obtain for instructing Experienced data are one of most important links, and the data of high quality are the key that machine learning model training.

Based on this, in specific implementation, in order to determine for the image data of machine learning model training, base can be first obtained Plinth image data collection, wherein the basis image data collection can include: the first data set containing user's uploading pictures and contain By the second data set of the picture that preset time interval intercepts at random.

In existing video website, there are two the main channels that generates for video cover: one be uploaded videos user oneself A picture is uploaded as video cover, another is that system above-mentioned intercepts picture by preset time interval at random and therefrom selects Several width pictures are taken to be supplied to user's selection, user therefrom chooses a width picture as video cover.On the one hand, on user oneself The picture of biography is typically all the relatively good picture of well-chosen matter, but is wherein also not excluded for having some seem not It is good picture, we (can also be understood as being the relatively high data of quality using such picture as the first data set Collection)；On the other hand, system is based on this, is supplied to the picture quality meeting that user does selection due to being to intercept picture in a random basis It is very different, but wherein also it is not excluded for having the pretty good picture of some quality, we can be by such picture as second Data set (it can be appreciated that quality relatively low data set), in the present embodiment, we can be by the first data set and Two data sets are determined as the basic image data for machine learning model training.

After getting above-mentioned basic image data, the color character that basic image data concentrates picture can be further obtained Parameter value (for example may include tone value, intensity value, brightness value, rgb value etc.), then can be according to the color character parameter value Basic image data is concentrated to the picture removal for not meeting prerequisite, to obtain the picture number for machine learning model training According to.

The selection of video cover is the very strong work of subjectivity, none objectively judges criterion, a picture Quality, often related to the subjective factor of people larger, different people has different viewpoint and preference, such as abundant Color, eye-catching people or object, the clarity of picture, contrast, saturation degree etc. are all to influence the factor of a width picture quality.

Therefore, in one implementation, we can first obtain the color character parameter that basic image data concentrates picture Value, which may include HSV (Hue (tone), Saturation (saturation degree), Luminence (brightness)) value Deng the color character numerical value of picture then can be calculated by tone value, intensity value, brightness value for getting etc., such as can The color characters numerical value such as color saturation, brightness, contrast including picture.Certainly, according to actual needs, can also pass through HSL (Hue (form and aspect), Saturation (saturation degree), Luminence (brightness)) value etc. is obtained to replace above-mentioned HSV value, with into Row subsequent step.

We can carry out the setting of color character weight to above-mentioned color character parameter value previously according to previous experience, than Such as: color saturation weight is 0.7, luminance weights 1, tone value weight are 0.8, etc..Then, we can be according to pre- The color character weight being first arranged is weighted and calculates to the corresponding color character parameter value of every width picture, to obtain every width The corresponding color character numerical value of picture, that is to say, that the corresponding color character numerical value of every width picture.

Next, can be low by color character numerical value in first data set according to the color character numerical value of every width picture It is removed in the picture (color character numerical value lower, picture) of poor quality of the first preset score value, to obtain the first kind Type data set (for example can be quality data), and color character numerical value in second data set is higher than second preset point The picture (picture that color character numerical value is higher, quality is pretty good) of value is removed, with obtain Second Type data set (such as Can be low quality data), and can be trained using first kind data set and Second Type data set as machine learning model is used for Image data.

In another implementation, we can first obtain the color character parameter value that basic image data concentrates picture, The color character parameter value may include Hue (tone) value, Saturation (saturation degree) value, RGB ((Red (red), Green (green), Blue (indigo plant)) value, it then can be respectively by the tone value that gets, intensity value, rgb value to not meeting prerequisite Picture removal.

In specific implementation, tone value in first data set can be lower than to picture and the institute of the first preset hue threshold It states the picture that tone value is higher than the second preset hue threshold in the second data set to be removed, that is, by color in the first data set The relatively good picture of tone in relatively poor picture and the second data set is adjusted to be removed, to reduce the first data set and the Picture number in two data sets, and then the operand of machine learning model training is reduced, operation time is reduced, operation speed is promoted Degree, while can also promote the quality of picture in the first data set and the second data set.

It then, can also be by intensity value in first data set lower than the picture of the first preset saturation degree threshold value and described The picture that intensity value is higher than the second preset saturation degree threshold value in second data set is removed, that is, by the first data set The relatively good picture of color saturation is removed in the relatively poor picture of color saturation and the second data set, to reduce Picture number in first data set and the second data set, and then the operand of machine learning model training is reduced, reduce operation Time, improving operational speed, while can also promote the quality of picture in the first data set and the second data set.

In addition, the artwork master in order to further enhance the picture quality in the first data set, in the first data set picture Piece (being also believed to pure gray scale picture) is not that we want to retain, that is to say, that black and white picture is not that we want to mention Picture of the user as video cover is supplied, therefore, is carried out the black and white picture in first data set according to the rgb value Removal, that is, chrominance information will not be included in the first data set (for example, three component values are three in 0 or RGB in RGB Component value is 255 etc.) black and white picture be removed, with this, the picture number in the first data set can be reduced, and then reduce The operand of machine learning model training, reduces operation time, improving operational speed, while can also be promoted and be schemed in the first data set The quality of piece.

Then, the picture remained in the first data set and the second data set is identified as first kind data Collect (i.e. quality data collection) and Second Type data set (i.e. low quality data collection), as machine learning model training Image data.

With this, " dirty data " fallen in training set in the prior art " can be cleaned " and (do not meet the figure of prerequisite namely Piece), the pretty good picture of quality in the picture and the picture that intercepts at random of system of the poor quality in picture uploaded including user, To solve the problems, such as to cause the machine learning model trained that ideal sort effect is not achieved due to there is " dirty data ".

In practical applications, in order to further decrease the operand that machine learning model is trained, according to preset color Feature weight can also be by the size adjusting of every width picture before doing weighted sum calculating to the color character parameter value of every width picture For preset size.

Due to the dimension of picture that system is truncated to may be it is bigger, can be before calculating weighted sum to figure Piece carries out the Aspect Ratio that resize operation adjusts picture with unification, to meet the requirement of machine learning model, for example, picture Original size is 1000*2000, and can be operated by resize by its size adjusting is 100*200, this operation can only change picture Size, with this, can effectively reduce the operand of machine learning model training without making picture metamorphopsic distortion, promote operation speed Degree.

In practical applications, there can be the very high picture of some similarities in the picture being truncated at random due to system, I Also the excessively high picture of similarity can only be retained one, to improve the quality of picture in the data set for training, and reduce Picture number in data set.

It in the present embodiment, can be in picture and the institute that color score value in first data set is lower than to the first preset score value State color score value in the second data set be higher than the second preset score value picture be removed after, to the first data set and the second number According to concentrating the similarity between remaining picture to be judged, for example, can be judged by the grey level histogram feature of picture picture it Between similarity, specifically, can first obtain the pixel data of each picture and generate the histogram data of each picture, then to each figure The histogram data of piece is normalized, and reuses Pasteur's Coefficient Algorithm and calculates histogram data, finally obtains Each picture similarity value, value range can be between [0,1], wherein 0 can indicate extremely different, 1 can indicate it is extremely similar (or It is identical), similarity judgement can be carried out according to the similarity value of each picture acquired.

Then, preset similarity threshold (for example similarity value is not less than 0.8) can be reached from similarity according to judging result Picture in choose a width picture and retained, that is to say, that only retain a width (i.e. other width figures in the high picture of similarity Piece all removes), so that the picture that will remain in the first data set and the second data set is as the first kind number According to collection and Second Type data set, with this, can further in first kind data set and Second Type data set picture quantity, And it can guarantee that picture feature covers comprehensive in first kind data set and Second Type data set, can further improve use In trained data set quality, picture number is reduced, and then the operand of machine learning model training can be reduced, promotes operation speed Degree.

Step 2, the image data is done repeatedly in the machine learning model of preparatory trained convolutional neural networks CNN In generation, trains, and the weight of convolutional neural networks is adjusted during repetitive exercise, in preparatory trained CNN machine learning mould The CNN machine learning model for picture classification is obtained on the basis of type.

Long time is generally required for the training of the machine learning model of large data sets, therefore, we can be added and move It moves the thought of study, the convolutional neural networks (CNN) that Inception-v3 is defined can be used and carry out transfer learning, wherein Inception-v3 is the Large Visual Recognition Challenge data for training ImageNet in 2012 Collection, this is a kind of Standard Task of computer vision field, whole image collection can be divided into 1000 classifications, Inception- The top5 error rate of v3 is 3.46%.

It in specific implementation, can be in the CNN machine learning model that trained Inception-v3 is defined, by not Disconnected repetitive exercise and the adjustment to neural network weight, with the CNN machine learning for picture classification to be suited the requirements Model, to increase the scalability and flexibility of model.

Step 3, the CNN machine learning model is assessed.

In the present embodiment, firstly, can be assessed by above-mentioned 10% validation data set, still, this assessment side Method possibly can not learn whether CNN machine learning model has the case where over-fitting, it is possible to appear in the standard in validation data set The problem of true rate is very high, but the effect is unsatisfactory in practical applications, finally to influence CNN machine learning model to picture classification Accuracy rate.

Therefore, manual evaluation can be also carried out, for example a video file can be randomly selected, and cut at random from video file Several width pictures (such as 100 width etc.) are taken, this 100 width picture is given a mark and is ranked up by CNN machine learning model, Then, high several width (such as preceding 8 width in scoring sequence) the picture several width low with score of score are chosen (such as in scoring sequence Rear 8 width) picture is compared, model is also given a mark into highest a few pictures and minimum several pictures of giving a mark compare Compared with assessing by comparing result CNN machine learning model.

On the basis of above-mentioned manual evaluation, secondary manual evaluation can also be carried out, for example, a video can arbitrarily be chosen File, can be by a few width pictures (such as 8 width pictures) of preset time interval (such as every 2 seconds primary) interception, by the random interception 8 width pictures and above-mentioned first time manual evaluation during the 8 width pictures of highest scoring chosen be compared, by comparing tying Fruit assesses machine learning model again.

It with this, is assessed with first passing through validation data set, then by way of manual evaluation twice, can avoid engineering The case where practising model over-fitting carries out more efficiently assessment to CNN machine learning model to realize, obtains preferably assessing effect Fruit, and then guarantee CNN machine learning model to the accuracy rate of picture classification.

Step 4, if assessment passes through, such as can be preset to can achieve first by the precision that validation data set is assessed Percentage (for example first preset percentage is 85%), and think to give a mark by CNN machine learning model by manual evaluation The high score picture obtained afterwards is more suitable for can achieve the second preset percentage as the ratio of video cover, and (for example this is second preset Percentage is that 90%), as assessment passes through, then training terminates and is used for using the CNN machine learning model as trained The CNN machine learning model of picture classification.

Step 5, if assessment does not pass through, such as can be preset to be not up to first by the precision that validation data set is assessed Percentage, and think that the high score picture obtained after giving a mark by CNN machine learning model is more suitable for as view by manual evaluation The ratio of frequency cover is not up to the second preset percentage), it as assesses and does not pass through.

In such cases, then the parameter of the used algorithm of CNN machine learning model can be adjusted, it specifically can be according to instruction The degree of convergence, the accuracy situation of training for practicing process are adjusted, for example can be used the TensorBoard of google intuitive The case where whether neural network restrains obtained, wherein Tensorboard is graphical, the visualization tool of Tensorflow, Tensorboard can be shown in Tensorflow by precision in tensor and the flow static map constituted and training process, partially The Dynamic Graph etc. of the analyses such as difference.

Adjustment for above-mentioned algorithm parameter, mainly to learning rate (learning rate), batch processing size The adjustment of the parameters such as (batch size), the number of iterations (step).For example, in parameter tuning process, if learning rate mistake Greatly, it may be such that convolutional neural networks are not restrained, be in concussion state, need to reduce learning rate at this time；If study speed Rate is too small, and convergence rate is slower, and more the number of iterations could make convolutional neural networks reach local extremum, settable at this time Biggish the number of iterations increases learning rate；In addition, batch processing size also influences whether convergent, can also by batch at The adjustment of size is managed to adjust convergent.That is, the details of study can be checked by TensorBoard, analyze The unreasonable place of the parameter setting of algorithm employed in machine learning model is simultaneously adjusted correspondingly, and is adjusted by parameter Journey, so that machine learning model finally restrains and training for promotion accuracy rate.

After parameter adjustment, the image data is continued to do in algorithm parameter CNN machine learning model adjusted Repetitive exercise, and the weight of convolutional neural networks is adjusted during repetitive exercise, until the obtained CNN for picture classification Machine learning model assessment passes through.

The high preset width number picture of score is supplied to user as the candidate picture of video cover according to sequence by S103, So that user carries out the selection of video cover from the candidate picture.

Wherein, sequence can be ascending order (score is from low to high) or descending (score is from high to low), in the present embodiment, can Selection is ranked up with descending, and the high preset width number picture of score (such as preceding 8 in sequence can be chosen from the forefront of sequence Width) as the candidate picture of video cover it is supplied to user, so that user chooses a width picture as video from this 8 width picture Cover.

In specific implementation, when user is to any picture carries out (in namely above-mentioned 8 width picture) in above-mentioned candidate picture When clicking operation, user is as received to the selection instruction of the picture, can be selected user according to the selection instruction Picture is determined as the video cover of video file.

The present inventor has carried out a large amount of experiment in R&D process, according to the above-mentioned repetitive exercise to machine learning model Method has obtained 6 editions CNN machine learning models for picture marking, is reached by the precision that validation data set is assessed 89.9%, think that the high score picture obtained after giving a mark by CNN machine learning model is more suitable for as video by manual evaluation The ratio of cover reaches 93.3%, and the picture given a mark and provided by CNN machine learning model has clarity height, contrast The features such as good, bright in luster abundant, containing significant object (personage or object etc.), than traditional video cover choosing method More high-quality and high-efficiency.

- 1 to the 3-3 part comparison diagram (wherein color does not show that) tested for inventor referring to Fig. 3, in Fig. 3-1 to 3-3 In, 8 width picture of top is highest 8 width picture of giving a mark, and lower section is 8 minimum width pictures of giving a mark in same video.

It can be according to consecutive frame in video file after receiving the video file of user's upload by the embodiment of the present application The situation of change of content determines scene change key frame and intercepts to the corresponding picture of the scene change frame, then can lead to Crossing the preparatory trained machine learning model for picture classification is that the picture being truncated to is given a mark and sorted, further according to row The high preset width number picture of score is supplied to user as the candidate picture of video cover by sequence, so that user schemes from the candidate The selection of video cover is carried out in piece.With this, it not only can guarantee all important scenes that do not omit in video file, but also can reduce and mention Picture multiplicity in video cover candidate's picture of confession, promotes the quality of candidate picture, and the user that is more convenient for therefrom chooses to more For suitable video cover.

Corresponding with the providing method of the video cover provided in previous embodiment, the embodiment of the present application also provides one kind The offer device of video cover, referring to fig. 4, the apparatus may include:

Screenshot unit 41, for receiving the video file of user's upload, and according to the change of content frame adjacent in video file Change situation to determine scene change key frame and intercept the corresponding picture of the scene change key frame.

In specific implementation, the screenshot unit 41, can be specifically used for:

Marking unit 42, for being the figure being truncated to by the preparatory trained machine learning model for picture classification Piece is given a mark and is sorted.

Candidate picture provides unit 43, for according to sequence using the high preset width number picture of score as the time of video cover Picture is selected to be supplied to user, so that user carries out the selection of video cover from the candidate picture.

In addition, described device, may also include that

Video cover determination unit, the picture for selecting user are determined as the video cover of video file.

In the present embodiment, to the training used in the marking unit 42 for the machine learning model of picture classification Process, it may include following steps:

Step 1, the image data for machine learning model training is determined.

In specific implementation, basic image data collection can be first obtained, the basis image data collection includes: containing on user First data set of blit piece and the second data set containing the picture intercepted at random by preset time interval.

Then, the color character parameter value that basic image data concentrates picture can be obtained, such as including the color character Parameter value includes tone value, intensity value, brightness value, rgb value etc., further according to the color character parameter value by foundation drawing the piece number According to the picture for not meeting prerequisite removal is concentrated, to obtain the image data for machine learning model training.

In one implementation, can be after obtaining basic image data and concentrating the color character parameter value of picture, the color Color characteristic ginseng value may include HSV (Hue (tone), Saturation (saturation degree), Luminence (brightness) value, according to preset Color character weight, weighted sum calculating is done to the color character parameter value of every width picture, to obtain the corresponding color of every width picture Then color character numerical value in first data set is lower than the picture and described second of the first preset score value by color character numerical value The picture that color character numerical value is higher than the second preset score value in data set is removed, and obtains first kind data set and the respectively Two categorical data collection, using as machine learning model training image data.

It in another implementation, for example, can be in the color character parameter value for obtaining basic image data concentration picture Afterwards, which may include Hue (tone) value, Saturation (saturation degree) value, RGB ((Red (red), Green (green), Blue (indigo plant)) value, tone value in first data set is lower than to the picture and described second of the first preset hue threshold The picture that tone value is higher than the second preset hue threshold in data set is removed, and next again embezzles first data set It is higher than the second preset saturation lower than intensity value in the picture and second data set of the first preset saturation degree threshold value with angle value The picture of degree threshold value is removed.

Then, the black and white picture in first data set can be also removed according to the rgb value, that is, by In one data set do not include chrominance information (for example, in RGB three component values be in 0 or RGB three component values be 255 etc.) Black and white picture be removed, further increase the quality of picture in data, reduce the operand of model training, reduce when calculating Between, improving operational speed.

Finally, the picture remained in the first data set and the second data set is identified as first kind data Collection and Second Type data set, using as machine learning model training image data.

In addition, improving operational speed can also be according to preset color in order to further decrease the operand of model training Feature weight, before doing weighted sum calculating to the color character parameter value of every width picture, the size adjusting by every width picture is pre- Size is set, every width picture is adjusted to the size of model needs.

Since there may be the very high pictures of some similarities in the first data set and the second data set, in order to improve number According to the quality of intensive data, picture number is reduced, the operand of model training, and then improving operational speed are reduced, can also incited somebody to action Color score value is higher than lower than color score value in the picture and second data set of the first preset score value in first data set After the picture of second preset score value is removed, to the similarity between picture remaining in the first data set and the second data set Judged, and a width picture is chosen from the picture that similarity reaches preset similarity threshold according to judging result and is protected It stays, so that the picture that will remain in the first data set and the second data set is as the first kind data set and Two categorical data collection can be obtained that multiplicity is low, data set of better quality with this.

Wherein, the convolutional neural networks can be the convolutional neural networks that Inception-v3 is defined.

Step 3, the CNN machine learning model is assessed.

Step 4, if assessment pass through, training terminate and using the CNN machine learning model for being used for picture classification as The trained CNN machine learning model for picture classification；

Step 5, assessment do not pass through, then in the CNN machine learning model for picture classification use algorithm parameter into Row adjustment, to continue to do the image data in the parameter CNN machine learning model adjusted for being used for picture classification Iteration instruction and the weight that convolutional neural networks are adjusted during repetitive exercise, until the obtained CNN machine for picture classification The assessment of device learning model passes through.

As seen through the above description of the embodiments, those skilled in the art can be understood that the application can It realizes by means of software and necessary general hardware platform.Based on this understanding, the technical solution essence of the application On in other words the part that contributes to existing technology can be embodied in the form of software products, the computer software product It can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that a computer equipment (can be personal computer, server or the network equipment etc.) executes the certain of each embodiment of the application or embodiment Method described in part.

All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system or For system embodiment, since it is substantially similar to the method embodiment, so describing fairly simple, related place is referring to method The part of embodiment illustrates.System and system embodiment described above is only schematical, wherein the conduct The unit of separate part description may or may not be physically separated, component shown as a unit can be or Person may not be physical unit, it can and it is in one place, or may be distributed over multiple network units.It can root According to actual need that some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.Ordinary skill Personnel can understand and implement without creative efforts.

Above to the providing method and device of video cover provided herein, it is described in detail, answers herein With specific case, the principle and implementation of this application are described, and the explanation of above example is only intended to help to manage Solve the present processes and its core concept；At the same time, for those skilled in the art, according to the thought of the application, There will be changes in specific embodiment and application range.In conclusion the content of the present specification should not be construed as to this Shen Limitation please.

Claims

1. a kind of providing method of video cover characterized by comprising

The video file that user uploads is received, scene change key is determined according to the situation of change of content frame adjacent in video file Frame simultaneously intercepts the corresponding picture of the scene change key frame；

It is that the picture being truncated to is given a mark and sorted by the preparatory trained machine learning model for picture classification；

The high preset width number picture of score is supplied to user as the candidate picture of video cover according to sequence, so as to user from The selection of video cover is carried out in candidate's picture；

Scene change key frame is determined according to the situation of change of content frame adjacent in video file and to the scene change key The corresponding picture of frame is intercepted, comprising:

The corresponding picture of scene change key frame is intercepted, and the picture being truncated to is formed into scene change key frame picture Set；

Training to the machine learning model for picture classification, comprising:

Determine the image data for machine learning model training；

The determination is used for the image data of machine learning model training, comprising:

Obtain basic image data collection；

The picture for not meeting prerequisite is concentrated to remove basic image data according to the color character parameter value, to obtain use In the image data of machine learning model training；

It is described basis image data collection include: the first data set containing user's uploading pictures and containing by preset time interval with Second data set of the picture of machine interception；

The picture for not meeting prerequisite is concentrated to remove basic image data according to the color character parameter value, to obtain use In the image data of machine learning model training, comprising:

According to preset color character weight, weighted sum calculating is done to the color character parameter value of every width picture, to obtain every width The corresponding color character numerical value of picture；

By color character numerical value in first data set lower than color in the picture and second data set of the first preset score value The picture that color character numerical value is higher than the second preset score value is removed, and obtains first kind data set and Second Type data respectively Collection, using as machine learning model training image data.

2. the method according to claim 1, wherein further include:

The picture that user selects is determined as video cover.

3. the method according to claim 1, wherein the training to the machine learning model for picture classification, Include:

Determine the image data for machine learning model training；

The image data is done into repetitive exercise in the machine learning model of convolutional neural networks CNN, and in repetitive exercise mistake The weight of convolutional neural networks is adjusted in journey, to obtain the CNN machine for picture classification on the basis of CNN machine learning model Device learning model；

The CNN machine learning model for picture classification is assessed；

If assessment passes through, training terminates and using the CNN machine learning model for being used for picture classification as trained use In the CNN machine learning model of picture classification.

4. according to the method described in claim 3, it is characterized by further comprising:

If assessment does not pass through, the parameter of the algorithm used in the CNN machine learning model for picture classification is adjusted, So that the image data is continued to do iteration instruction in the parameter CNN machine learning model adjusted for being used for picture classification Practice, and adjust the weight of convolutional neural networks during repetitive exercise, until the obtained CNN engineering for picture classification Model evaluation is practised to pass through.

5. according to the method described in claim 3, it is characterized in that, the basis image data collection includes: to upload containing user First data set of picture and the second data set containing the picture intercepted at random by preset time interval；

By tone value in first data set lower than tone in the picture and second data set of the first preset hue threshold The picture that value is higher than the second preset hue threshold is removed；

By intensity value in first data set lower than in the picture and second data set of the first preset saturation degree threshold value The picture that intensity value is higher than the second preset saturation degree threshold value is removed；

The picture remained in first data set and the second data set is identified as first kind data set and second Categorical data collection, using as machine learning model training image data.

6. the method according to claim 1, wherein being lower than by color character numerical value in first data set The picture that color character numerical value is higher than the second preset score value in the picture of first preset score value and second data set is gone Except later, further includes:

The similarity between picture remaining in the first data set and the second data set is judged respectively, and according to judging result It chooses a width picture from the picture that similarity reaches preset similarity threshold to be retained, so as to by the first data set and second The picture remained in data set is respectively as the first kind data set and Second Type data set.

7. a kind of offer device of video cover characterized by comprising

Screenshot unit, for receiving the video file of user's upload, and according to the situation of change of content frame adjacent in video file It determines scene change key frame and the corresponding picture of the scene change key frame is intercepted；The screenshot unit, specifically For: judge that whether adjacent two content frames variation is beyond preset change threshold in video file；It will exceed preset change threshold Frame be determined as scene change key frame；The picture that the corresponding picture of scene change key frame is intercepted, and will be truncated to Form scene change key frame picture set；

Marking unit, for being that the picture being truncated to carries out by the preparatory trained machine learning model for picture classification It gives a mark and sorts；

Candidate picture provides unit, for according to sequence using the high preset width number picture of score as the candidate picture of video cover It is supplied to user, so that user carries out the selection of video cover from the candidate picture；

Training to the machine learning model for picture classification, comprising:

Determine the image data for machine learning model training；

Obtain basic image data collection；

8. device according to claim 7, which is characterized in that further include: