CN110166827A

CN110166827A - Determination method, apparatus, storage medium and the electronic device of video clip

Info

Publication number: CN110166827A
Application number: CN201811427035.8A
Authority: CN
Inventors: 黄超; 周大军; 张力柯; 荆彦青
Original assignee: Shenzhen Tencent Information Technology Co Ltd
Current assignee: Shenzhen Tencent Information Technology Co Ltd
Priority date: 2018-11-27
Filing date: 2018-11-27
Publication date: 2019-08-23
Anticipated expiration: 2038-11-27
Also published as: CN110166827B

Abstract

The invention discloses determination method, apparatus, storage medium and the electronic devices of a kind of video clip.Wherein, this method comprises: obtaining multiple video clips from video resource；Obtain the fisrt feature and second feature of each video clip in the multiple video clip, wherein, the fisrt feature is used to indicate the characteristics of image of object included in each video clip, and the second feature is used to indicate the motion feature of object included in each video clip；According to the fisrt feature and the second feature, target video segment is determined in the multiple video clip, wherein, the target video segment is the video clip that object included in the multiple video clip has target image characteristics and Target Motion Character.The present invention solves the technical problem for determining that the efficiency for the video clip for meeting condition is lower from video resource in the related technology.

Description

Determination method, apparatus, storage medium and the electronic device of video clip

Technical field

The present invention relates to computer fields, are situated between in particular to a kind of determination method, apparatus of video clip, storage Matter and electronic device.

Background technique

By part excellent in video resource or spectators may more interested partial clip come out and be supplied to user, energy More users are enough attracted to pay close attention to.The mode of editing video resource thinks him from video generally by staff at present For excellent partial clip into a video, this mode editing speed is slow, while wasting time with energy, the standard of editing Also it is difficult to grasp, carrys out editing video with the cognition of staff entirely, can not accurately grasps the demand of user, lead to video clipping Efficiency is very low.

For above-mentioned problem, currently no effective solution has been proposed.

Summary of the invention

The embodiment of the invention provides determination method, apparatus, storage medium and the electronic devices of a kind of video clip, so that Few technical problem for solving to determine that the efficiency for the video clip for meeting condition is lower from video resource in the related technology.

According to an aspect of an embodiment of the present invention, a kind of determination method of video clip is provided, comprising: provide from video Multiple video clips are obtained in source；The fisrt feature and second feature of each video clip in the multiple video clip are obtained, Wherein, the fisrt feature is used to indicate the characteristics of image of object included in each video clip, and described second is special Sign is used to indicate the motion feature of object included in each video clip；According to the fisrt feature and described second Feature determines target video segment in the multiple video clip, wherein the target video segment is the multiple view Included object has the video clip of target image characteristics and Target Motion Character in frequency segment.

According to another aspect of an embodiment of the present invention, a kind of determining device of video clip is additionally provided, comprising: first obtains Modulus block, for obtaining multiple video clips from video resource；Second obtains module, for obtaining the multiple video clip In each video clip fisrt feature and second feature, wherein the fisrt feature is used to indicate each video clip In included object characteristics of image, the second feature is used to indicate object included in each video clip Motion feature；Determining module, for being determined in the multiple video clip according to the fisrt feature and the second feature Target video segment out, wherein the target video segment is that object included in the multiple video clip has target The video clip of characteristics of image and Target Motion Character.

Optionally, the first determination unit includes: the first input subelement, for the fisrt feature input picture to be classified Model obtains the corresponding first category parameter of each video clip, wherein described image disaggregated model is that use is labeled with The model that the fisrt feature sample of image category is trained the first disaggregated model, the first category parameter is for referring to Show image category belonging to each video clip；Second input subelement, for dividing the second feature input motion Class model obtains the corresponding second category parameter of each video clip, wherein the classification of motions model is using mark The model for having the second feature sample of sports category to be trained the second disaggregated model, the second category parameter are used for Indicate sports category belonging to each video clip；First determines subelement, for each video clip is corresponding The first category parameter and the weighted sum of the second category parameter be determined as the video classification of each video clip Parameter, wherein the video classification parameter is used to indicate video classification belonging to each video clip.

Optionally, the first determination unit includes: fusion subelement, special for described first to each video clip The second feature of seeking peace carries out Fusion Features, obtains the space-time characteristic of each video clip；Third inputs subelement, uses In the space-time characteristic of each video clip is inputted space-time disaggregated model, it is corresponding to obtain each video clip Video classification parameter, wherein the space-time disaggregated model is that use is labeled with the other space-time characteristic sample of video class to third point The model that class model is trained, the video classification parameter are used to indicate video class belonging to each video clip Not.

Optionally, the second determination unit includes: the first acquisition subelement, for obtaining institute from the multiple video clip State the video clip that video classification parameter falls into targets threshold range；Second determines subelement, for joining the video classification The video clip scolded into targets threshold range is determined as the target video segment.

Optionally, the second acquisition module includes: the first input unit, for each video clip input first is special Sign extracts model, obtains the fisrt feature of each video clip, wherein it is to use that the fisrt feature, which extracts model, The model that fisrt feature sample is trained initial fisrt feature model；Second input unit, being used for will be described each Video clip inputs light stream Feature Selection Model, obtains the Optical-flow Feature of each video clip, wherein the Optical-flow Feature Extracting model is the model being trained using Optical-flow Feature sample to initial light stream characteristic model, the second feature packet Include the Optical-flow Feature.

Optionally, described device further include: third obtains module, for obtaining the RGB threeway of initial pictures convolution model The average value of the corresponding first convolution layer parameter in road, and the average value is determined as the of the initial light stream characteristic model The initiation parameter of one convolutional layer；Module is adjusted, for adjusting the numberical range of the light stream spectrum of initial light stream feature samples For the numberical range of the input parameter of the initial pictures convolution model, the Optical-flow Feature sample is obtained；Training module is used for The initial light stream characteristic model is trained using the Optical-flow Feature sample, the Optical-flow Feature is obtained and extracts model.

Optionally, the multiple video clip is the video clip in the video resource including target object, wherein the One acquisition module includes: first acquisition unit, for obtaining the corresponding screen template of the target object, wherein the picture Template is used to indicate the attribute information of the target object；Second acquisition unit, for from the video frame of the video resource Obtain the target video frame for being higher than target similarity with the similarity of the screen template；Third acquiring unit is used for from described The multiple video clip is obtained in target video frame, wherein include in each video clip in the multiple video clip One or continuous multiple target video frames.

Optionally, second acquisition unit includes: the second acquisition subelement, for obtaining from the video frame of the video resource Take the first video frame, wherein the first attribute of first video frame and the first attributes match of the target object；Third obtains Subelement is taken, is higher than the target similarity with the similarity of the screen template for obtaining from first video frame Second video frame is as the target video frame.

Optionally, second acquisition unit includes: division subelement, for by each of the video frame of the video resource Video frame is divided into foreground picture and background frame；Third determines subelement, for determining the institute of each video frame respectively State the background frame and background template of the first distance and each video frame between foreground picture and foreground template Between second distance, wherein the screen template includes the foreground template and the background template；4th determines that son is single Member, for determining the first distance of each video frame and the weighted sum of the second distance, wherein described first away from From corresponding first weight, corresponding second weight of the second distance, first weight is greater than second weight；5th determines Subelement, for the corresponding weighted sum of video frame each in the video frame of the video resource to be lower than to the video of target value Frame is determined as the target video frame.

Optionally, the target video segment is multiple target video segments, wherein described device further include: splicing mould Block, for the multiple target video segment to be spliced into target video resource sequentially in time；Sending module is used for institute It states target video resource and is sent to client for playing the target video resource.

According to another aspect of an embodiment of the present invention, a kind of storage medium is additionally provided, which is characterized in that the storage is situated between Computer program is stored in matter, wherein the computer program is arranged to execute described in any of the above-described when operation Method.

According to another aspect of an embodiment of the present invention, a kind of electronic device, including memory and processor are additionally provided, It is characterized in that, computer program is stored in the memory, and the processor is arranged to hold by the computer program Method described in row any of the above-described.

In embodiments of the present invention, using obtaining multiple video clips from video resource；Obtain the multiple piece of video The fisrt feature and second feature of each video clip in section, wherein the fisrt feature is used to indicate each piece of video The characteristics of image of included object in section, the second feature are used to indicate object included in each video clip Motion feature；According to the fisrt feature and the second feature, target video is determined in the multiple video clip Segment, wherein the target video segment be object included in the multiple video clip have target image characteristics and The mode of the video clip of Target Motion Character passes through be used to indicate characteristics of image to the video clip in video resource One feature and be used to indicate motion feature second feature extraction, according to fisrt feature and second feature from multiple piece of video The video clip for meeting target image characteristics and Target Motion Character is determined in section, to realize obtaining automatically for video clip Feature of the target video segment on image dimension and movement dimension has been fully considered while taking, and has been made it possible to more accurately It determines the target video segment for the condition that meets, improves the video clip for determining the condition that meets from video resource to realize The technical effect of efficiency, so solve in the related technology from video resource determine meet condition video clip efficiency compared with Low technical problem.

Detailed description of the invention

The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:

Fig. 1 is a kind of schematic diagram of the determination method of optional video clip according to an embodiment of the present invention；

Fig. 2 is a kind of application environment schematic diagram of the determination method of optional video clip according to an embodiment of the present invention；

Fig. 3 is a kind of schematic diagram of the determination method of optional video clip of optional embodiment according to the present invention；

Fig. 4 is a kind of schematic diagram of optional TSN network model of optional embodiment according to the present invention；

Fig. 5 is a kind of schematic diagram of optional Inception model structure of optional embodiment according to the present invention；

Fig. 6 is a kind of schematic diagram of optional convolutional network model structure of optional embodiment according to the present invention；

Fig. 7 is a kind of signal of optional automatic clipping method of video clip of optional embodiment according to the present invention Figure；

Fig. 8 is a kind of signal of the acquisition methods of optional multiple video clips of optional embodiment according to the present invention Figure；

Fig. 9 is showing for the acquisition methods of the optional multiple video clips of another kind of optional embodiment according to the present invention It is intended to；

Figure 10 is a kind of schematic diagram of the determining device of optional video clip according to an embodiment of the present invention；

Figure 11 is a kind of application scenarios schematic diagram of the determination method of optional video clip according to an embodiment of the present invention； And

Figure 12 is a kind of schematic diagram of optional electronic device according to an embodiment of the present invention.

Specific embodiment

In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.

It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.

According to an aspect of an embodiment of the present invention, a kind of determination method of video clip is provided, as shown in Figure 1, should Method includes:

S102 obtains multiple video clips from video resource；

S104 obtains the fisrt feature and second feature of each video clip in multiple video clips, wherein fisrt feature It is used to indicate the characteristics of image of object included in each video clip, second feature is used to indicate institute in each video clip Including object motion feature；

S106 determines target video segment according to fisrt feature and second feature in multiple video clips, wherein Target video segment is the video that object included in multiple video clips has target image characteristics and Target Motion Character Segment.

Optionally, in the present embodiment, the determination method of above-mentioned video clip can be applied to server as shown in Figure 2 202 and the hardware environment that is constituted of client 204 in.As shown in Fig. 2, server 202 obtains multiple videos from video resource Segment；Obtain the fisrt feature and second feature of each video clip in multiple video clips, wherein fisrt feature is used to indicate The characteristics of image of included object in each video clip, second feature are used to indicate pair included in each video clip The motion feature of elephant；According to fisrt feature and second feature, target video segment is determined in multiple video clips, wherein Target video segment is the video that object included in multiple video clips has target image characteristics and Target Motion Character Segment.

Optionally, in the present embodiment, server 202 can send one or more obtained target video segment To client 204, one or more target video segment received is shown by client 204.Server 202 can also be In the case that target video segment is multiple target video segments, multiple target video fragment assemblies are provided at a target video Source, then client 204 will be sent to target video resource.Client 204, which shows the target video resource received, is shielding On curtain.

Optionally, in the present embodiment, the determination method of above-mentioned video clip can be, but not limited to be applied to video clipping Scene in.Wherein, above-mentioned client can be, but not limited to as various types of applications, for example, online education application, Instant Messenger Interrogate application, community space application, game application, shopping application, browser application, financial application, multimedia application, live streaming application Deng.Specifically, can be, but not limited to be applied in above-mentioned game application carry out video clipping scene in, or can with but not It is limited to be applied to carry out in above-mentioned multimedia application in the scene of video clipping, meets item to improve the determination from video resource The efficiency of the video clip of part.Above-mentioned is only a kind of example, does not do any restriction to this in the present embodiment.

Optionally, in the present embodiment, the determination method of above-mentioned video clip can be, but not limited to be executed by server, or Person can also be by client executing, or can also be executed by server and client side's interaction.

Optionally, in the present embodiment, video resource can be, but not limited to include video website, the view in video player Frequency resource, such as: movie and television play, animation, variety video file.It can also include game live streaming, sports show, live streaming application etc. Deng live video stream, such as: sports event live broadcast, competition game live streaming, TV programme live streaming etc..Alternatively, can also wrap Include video, such as game video etc. obtained in the use process of client.

Optionally, in the present embodiment, video clip can be, but not limited to meet certain item in above-mentioned video resource The segment of part.Such as: movie and television play, animation, the performance segment that certain performer or singer are determined in variety video file.It can be with Including the Highlight etc. in live video stream, such as: the video clip for Highlight of competing in sports event live broadcast, or Goal segment collection of choice specimens of certain soccer star etc..It either can also include wonderful or the useful content in client use process Segment etc., such as: kill segment, teaching segment of office software use process etc. in game.

Optionally, in the present embodiment, fisrt feature is used to indicate the image of object included in each video clip Feature.Such as: characteristics of image can be, but not limited to include the spatial relationship between color characteristic, textural characteristics, shape feature, object Feature etc..

Optionally, in the present embodiment, second feature is used to indicate object included in each video clip Motion feature.Such as: motion feature can be, but not limited to include Optical-flow Feature, feature of motion amplitude for indicating object etc. Deng.

Optionally, in the present embodiment, target image characteristics and Target Motion Character can be, but not limited to be according to mesh The condition that the target video segment that the demand of mark video clip determines needs to meet.Target image characteristics and Target Motion Character can With but be not limited by configuration and obtain, or the analysis of historical data can also be automatically generated by smart network.Than Such as: by taking basketball soccer star's goal collection of choice specimens as an example, the history goal segment of the soccer star can be input in intelligent algorithm, by calculating Method, which automatically identifies target image characteristics, may include the facial characteristics of the soccer star, the feature of basketry, feature of basketball etc., mesh The movement range that mark motion feature may include the soccer star enters basketry etc. more than target value, basketball.

Optionally, in the present embodiment, it can determine that the video clip that fisrt feature is target image characteristics is with mesh The video clip of logo image feature, second feature are that the video clip of Target Motion Character is the video with Target Motion Character Segment.Or it can also determine that the similarity of fisrt feature and target image characteristics reaches the video clip of certain value for target The video clip that the similarity of the video clip of characteristics of image, second feature and Target Motion Character reaches certain value is with target The video clip of characteristics of image.It should be noted that determining whether object included in video clip has target image special Target Motion Character of seeking peace is without being limited thereto.

In an optional embodiment, it is to eliminate the determination process of the wonderful of enemy in gunbattle class game Example, as shown in figure 3, obtaining multiple video clips (segment 1, segment 2, segment 3) from game video, can be and regard the game Frequency be divided into multiple video clips or be also possible to extract from game video may be excellent superseded camera lens segment, obtain The fisrt feature and second feature of each video clip in multiple video clips are taken, such as: 1 character pair 1a of segment and feature 2a, segment 2 character pair 1b and feature 2b, segment 3 character pair 1c and feature 2c are special according to above-mentioned fisrt feature and second Sign, determines that feature 1a, feature 1b and feature 1c and the similarity of target image characteristics 1 are eligible, determines multiple piece of video Section all has target image characteristics, but the similarity of feature 2a and Target Motion Character 2 is unsatisfactory for condition, feature 2b and feature 2c Meet condition with the similarity of Target Motion Character 2, then determines that target video segment is 2 He of segment in multiple video clips Segment 3.

As it can be seen that through the above steps, passing through be used to indicate characteristics of image first to the video clip in video resource Feature and be used to indicate motion feature second feature extraction, according to fisrt feature and second feature from multiple video clips In determine the video clip for meeting target image characteristics and Target Motion Character, to realize the automatic acquisition of video clip While fully considered feature of the target video segment on image dimension and movement dimension, make it possible to more accurately really Surely meet the target video segment of condition, to realize the effect for improving the video clip for determining the condition that meets from video resource The technical effect of rate, and then solve and determine that the efficiency for the video clip for meeting condition is lower from video resource in the related technology The technical issues of.

Target is determined in multiple video clips according to fisrt feature and second feature as a kind of optional scheme Video clip includes:

S1 determines the video classification of each video clip in multiple video clips according to fisrt feature and second feature；

The video clip that video classification in multiple video clips is target category is determined as target video segment by S2.

It optionally, in the present embodiment, can be according to fisrt feature and second during determining target video segment Feature classifies to multiple video clips, then the video clip for belonging to target category is determined as target video segment.

Optionally, in the present embodiment, the mode classified to multiple video clips can be, but not limited to include using The mode of machine learning determines the classification of each video clip using sorter network model.Such as: utilize sorter network by video Segment is divided into two classes: the video clip for wherein belonging to wonderful is determined as target video piece by wonderful and common segment Section.

As a kind of optional scheme, each piece of video in multiple video clips is determined according to fisrt feature and second feature Section video classification include:

Fisrt feature input picture disaggregated model is obtained the corresponding first category parameter of each video clip by S1, In, image classification model is to be trained using the fisrt feature sample for being labeled with image category to the first disaggregated model Model, first category parameter are used to indicate image category belonging to each video clip；

Second feature input motion disaggregated model is obtained the corresponding second category parameter of each video clip by S2, In, classification of motions model is to be trained using the second feature sample for being labeled with sports category to the second disaggregated model Model, second category parameter are used to indicate sports category belonging to each video clip；

The weighted sum of the corresponding first category parameter of each video clip and second category parameter is determined as each view by S3 The video classification parameter of frequency segment, wherein video classification parameter is used to indicate video classification belonging to each video clip.

Optionally, in the present embodiment, it can be, but not limited to training image characteristic model to be used to according to characteristics of image to view Frequency segment is classified, and training motion feature model is used to classify to video clip according to motion feature, then by two kinds points Class carries out the final classification that video clip is determined in integration.

Optionally, in the present embodiment, first category parameter and second category parameter can be, but not limited to as a classification Probability, such as: video clip is divided into two classes, excellent and common, model output classification parameter be used to indicate to belong to excellent class The video clip that this probability is higher than predetermined value can be determined as featured videos segment by the probability of video clip.

Optionally, in the present embodiment, first category parameter and second category parameter can be, but not limited to the mark for classification Label.Such as: 0 indicates ordinary video, and 1 indicates featured videos etc..

Optionally, in the present embodiment, during determining the weighted sum of first category parameter and second category parameter, Weighted value can be distributed according to characteristics of image and motion feature significance level other for video class.Such as: motion feature pair Video class is other to be affected, then can distribute biggish weighted value 0.7 for motion feature, distribute lesser power for characteristics of image Weight values 0.3.

In an optional embodiment, multiple video clips are input in a TSN network, when TSN network is Space division cuts network Temporal Segment Network, is to combine characteristics of image and Optical-flow Feature to realize video A kind of new way of the end-to-end study of content analysis.TSN network is extracted by image convolution network and light stream convolutional network Space-time characteristic is given a mark respectively, and the excellent degree marking of video clip is exported after score is merged.Finally, output marking is higher Video clip is as target video segment.

As shown in figure 4, TSN network is intended to execute prediction video tab using the visual information of entire video, by image Convolutional network and light stream convolutional network composition.Space-time dividing network does not handle single frame or frame storehouse instead of, to entire video The short-movie section sequence of middle sparse sampling is operated.Each segment in this sequence will generate the tentative prediction of video tab. Then, final prediction result is generated after segment prediction fusion.In learning process, model parameter can be updated by iteration come excellent Change the penalty values of prediction.In form, give a video, it be divided into K identical duration segment S1, S2 ..., SK }, then, space-time dividing network passes through following model prediction video tab:

TSN(T₁,T₂,...,T_K)=H (G (F (T₁,W),F(T₂,W),...,F(T_K,W)))

Wherein, (T1, T2 ..., TK) is a series of small fragments, and for image convolution network, Tk is the stochastical sampling from Sk A frame image, for light stream convolutional network, Tk is the continuous light stream image of five frames of the stochastical sampling from Sk.F (Tk, W) table Show the classification marking that the convolutional network of parameter W obtains after inputting Tk, G-function carries out the classification marking that different fragments obtain whole It closes, takes the method averaged in experiment.H function is the posterior probability for seeking each classification, for result is normalized, Using the method for softmax, in conjunction with entropy loss is intersected, the loss of TSN can be indicated are as follows:

Wherein, C is classification number, and yi is the true tag of the i-th class.It can be joined using stochastic gradient descent method Optimized model Number.

In order to extract the stronger middle level features of ability to express, convolutional network has used Inception model, this and be coupled Structure can extract multiple dimensioned abstract characteristics, thus the ability to express of lifting feature.Inception model uses parallel-connection structure, can To extract Analysis On Multi-scale Features, wherein two kinds of structures of Inception are as shown in Figure 5.Above-mentioned image convolution network and light stream convolution The overall structure of network can be as shown in Figure 6.Light stream convolutional network is similar with image convolution network, only first convolutional layer Input channel is changed to 5 by 3.

After obtaining trained model, 25 picture frames of same time spacing can be intercepted from the video clip of test It is composed with corresponding light stream, respectively input picture network and light stream network, obtains classification marking.Finally, specified not to the marking of two classes Same weight, selects the highest classification of score as final label after weighted sum.

S1, fisrt feature and second feature to each video clip carry out Fusion Features, obtain each video clip Space-time characteristic；

The space-time characteristic of each video clip is inputted space-time disaggregated model, obtains the corresponding view of each video clip by S2 Frequency classification parameter, wherein space-time disaggregated model is that use is labeled with the other space-time characteristic sample of video class to third disaggregated model The model being trained, video classification parameter are used to indicate video classification belonging to each video clip.

Optionally, in the present embodiment, fisrt feature and second feature can be subjected to Fusion Features, after recycling fusion Feature classify to video clip.

Optionally, in the present embodiment, space-time disaggregated model is that use is labeled with the other space-time characteristic sample pair of video class The model that third disaggregated model is trained.

Optionally, in the present embodiment, it can be, but not limited in the following manner really according to obtained video classification parameter Set the goal video clip: the video clip that video classification parameter falls into targets threshold range is obtained from multiple video clips, it will The video clip that video classification parameter falls into targets threshold range is determined as target video segment.

Optionally, in the present embodiment, 3D convolution and LSTM structure can be used to utilize space time correlation to determine video The video classification of segment.

In an optional embodiment, by taking game video wonderful editing as an example, as shown in fig. 7, one section of input Game video firstly generates possible wonderful, then extracts feature, warp by image convolution network and light stream convolutional network The excellent marking output wonderful of each video clip is exported as target video segment after crossing Fusion Features.

As a kind of optional scheme, the fisrt feature and second feature of each video clip in multiple video clips are obtained Include:

Each video clip input fisrt feature is extracted model, obtains the fisrt feature of each video clip by S1, In, it is the model being trained using fisrt feature sample to initial fisrt feature model that fisrt feature, which extracts model,；

Each video clip is inputted light stream Feature Selection Model, obtains the Optical-flow Feature of each video clip by S2, In, it is the model being trained using Optical-flow Feature sample to initial light stream characteristic model that Optical-flow Feature, which extracts model, the Two features include Optical-flow Feature.

Optionally, in the present embodiment, can be respectively trained fisrt feature for extracting fisrt feature extract model and Optical-flow Feature for extracting second feature extracts model.

Optionally, in the present embodiment, it can be, but not limited to that Optical-flow Feature is trained to extract model in the following manner: obtaining The average value of the corresponding first convolution layer parameter of the RGB triple channel of initial pictures convolution model, and average value is determined as just The initiation parameter of first convolutional layer of beginning Optical-flow Feature model, the numberical range that the light stream of initial light stream feature samples is composed It is adjusted to the numberical range of the input parameter of initial pictures convolution model, obtains Optical-flow Feature sample, uses Optical-flow Feature sample Initial light stream characteristic model is trained, Optical-flow Feature is obtained and extracts model.Aforesaid way can effectively avoid model training Over-fitting.

Optionally, in the present embodiment, the numberical range of the input parameter of initial pictures convolution model can be, but not limited to It is 0 to 255.

Optionally, in the present embodiment, few and incomplete in training samples number since above-mentioned each model parameter is excessive In the case where be easy over-fitting, therefore, can adopt it is following prevent model over-fitting: mode 1 with three kinds of modes, use extensive several Parameter initialization is done according to the model parameter of library (ImageNet) training, small parameter perturbations are done on this.For optical flow estimation, first The numberical range that light stream is composed is adjusted between 0 to 255, is allowed to identical as the numberical range of image.Then, by RGB triple channel The initiation parameter of first convolutional layer of the average value of corresponding first convolution layer parameter as optical flow estimation, other layers are protected It holds constant.Mode 2, using block method for normalizing.The variance and mean value for limiting each layer of output, can accelerate the convergence of model and mention The robustness of rising mould type.Mode 3 increases the quantity of training sample.It can be expanded using modes such as random cropping, overturnings in experiment The quantity of training sample.

As a kind of optional scheme, multiple video clips are the video clip in video resource including target object, In, multiple video clips are obtained from video resource includes:

S1 obtains the corresponding screen template of target object, wherein screen template is used to indicate the attribute letter of target object Breath；

S2 obtains the target video for being higher than target similarity with the similarity of screen template from the video frame of video resource Frame；

S3 obtains multiple video clips from target video frame, wherein in each video clip in multiple video clips Including one or continuous multiple target video frames.

Optionally, in the present embodiment, in order to more efficiently determine target video segment, video can be provided Video frame in source carries out preliminary screening, is screened out from it and meets certain condition the target view of (such as: including target object) Frequency frame is as multiple video clips.

Optionally, in the present embodiment, target object can be, but not limited to include target scene, target person, object Body, target text, target image etc..

Optionally, in the present embodiment, the video frame picture mould corresponding with target object in video resource can be used Plate is matched, if meet matching condition, such as: similarity is higher than target similarity, it is determined that the video frame is target view Frequency frame.

As a kind of optional scheme, is obtained from the video frame of video resource and be higher than target with the similarity of screen template The target video frame of similarity includes:

S1 obtains the first video frame from the video frame of video resource, wherein the first attribute and target of the first video frame First attributes match of object；

S2 obtains the second video frame conduct for being higher than target similarity with the similarity of screen template from the first video frame Target video frame.

Optionally, in the present embodiment, in order to further increase treatment effeciency, it can use the first attribute of target object It is matched with video frame, the video frame for meeting matching condition is screened and is carried out again as the first video frame with screen template Matching, to obtain target video frame.

Optionally, in the present embodiment, the first attribute can be, but not limited to include color attribute, texture properties, shape category Property etc..

In an optional embodiment, by taking the wonderful editing of gunbattle game as an example, as shown in figure 8, due to rifle Highlight in war game is largely the video clip for killing enemy, gives one section of game video, first by killing inspection It surveys and generates possible wonderful.By taking the existential mode of gunbattle game as an example, under this scheme, an opponent is often killed, schemed As intermediate region will appear red " superseded " printed words.As shown in figure 9, can calculate first red in the rectangular area among image The ratio of color, if ratio is higher than threshold value, it was initially believed that the image is the possible image for killing picture.It can use and be based on The template matching method of sliding window detects " superseded ".Then, the Euclidean distance between image block and template is calculated, if being lower than threshold Value, then detect and kill picture.Detecting red ratio is in order to reduce computation complexity, because the complexity of template matching is than meter The complexity for calculating red ratio is high, and it is the necessary condition killed that red ratio, which is more than threshold value,.

Each video frame in the video frame of video resource is divided into foreground picture and background frame by S1；

S2 determines the first distance between the foreground picture and foreground template of each video frame and each video respectively Second distance between the background frame and background template of frame, wherein screen template includes foreground template and background template；

S3 determines the first distance of each video frame and the weighted sum of second distance, wherein corresponding first power of first distance Weight, corresponding second weight of second distance, the first weight are greater than the second weight；

S4, the video frame by the corresponding weighted sum of video frame each in the video frame of video resource lower than target value are determined as Target video frame.

Optionally, in the present embodiment, in order to improve the screening accuracy of target video frame, each video frame is divided into Foreground picture and background frame, are matched with foreground template and background template respectively, respectively obtain matching result, then for for The matching result of the relatively large foreground picture of influence of target video frame distributes biggish first weight, for for target video The matching result of the relatively small background frame of influence of frame distributes lesser second weight, determines the weighted sum of matching result As the similarity between video frame and screen template, to determine target video frame according to the similarity.

Optionally, in the present embodiment, above-mentioned matching result can be, but not limited to by between video frame and screen template Distance indicate, such as: Euclidean distance, mahalanobis distance etc..It is higher apart from smaller expression similarity.

In the above-described embodiment, it can obtain through the above way and preferably kill detection effect, however, since game is carried on the back Scape variation is sometimes more violent, if template matching can be interfered by containing part background in template.It therefore, can be first Foreground mask (being equivalent to above-mentioned foreground template) is generated according to the color threshold of " superseded " printed words, it is European between feature calculating Apart from when assign prospect greater weight, give the smaller weight of background.It can effectively detect in game by this method and kill picture.

As a kind of optional scheme, target video segment is multiple target video segments, wherein according to fisrt feature And second feature, after target video segment is determined in multiple video clips, method further include:

Multiple target video segments are spliced into target video resource by S1 sequentially in time；

Target video resource is sent to the client for being used to play target video resource by S2.

Optionally, in the present embodiment, the above-mentioned multiple target video segments filtered out be can be for what user provided, Can be target video fragment assembly is that one or more target video resource is supplied to user.

Optionally, in the present embodiment, video clip can be spliced sequentially in time, it can also be suitable according to other splicings Sequence splices video clip.Such as: according to it is excellent degree marking height splice, splicing sequence can be, but not limited to be from high to low, From low to high or from low to high high etc. sequence again.

It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention It is necessary.

Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.

Other side according to an embodiment of the present invention additionally provides a kind of for implementing the determination side of above-mentioned video clip The determining device of the video clip of method, as shown in Figure 10, which includes:

First obtains module 102, for obtaining multiple video clips from video resource；

Second obtains module 104, for obtaining the fisrt feature of each video clip and the second spy in multiple video clips Sign, wherein fisrt feature is used to indicate the characteristics of image of object included in each video clip, and second feature is used to indicate The motion feature of included object in each video clip；

Determining module 106, for determining target video in multiple video clips according to fisrt feature and second feature Segment, wherein target video segment is that there are object included in multiple video clips target image characteristics and target to move The video clip of feature.

Optionally, in the present embodiment, the determining device of above-mentioned video clip can be applied to server as shown in Figure 2 202 and the hardware environment that is constituted of client 204 in.As shown in Fig. 2, server 202 obtains multiple videos from video resource Segment；Obtain the fisrt feature and second feature of each video clip in multiple video clips, wherein fisrt feature is used to indicate The characteristics of image of included object in each video clip, second feature are used to indicate pair included in each video clip The motion feature of elephant；According to fisrt feature and second feature, target video segment is determined in multiple video clips, wherein Target video segment is the video that object included in multiple video clips has target image characteristics and Target Motion Character Segment.

Optionally, in the present embodiment, the determining device of above-mentioned video clip can be, but not limited to be applied to video clipping Scene in.Wherein, above-mentioned client can be, but not limited to as various types of applications, for example, online education application, Instant Messenger Interrogate application, community space application, game application, shopping application, browser application, financial application, multimedia application, live streaming application Deng.Specifically, can be, but not limited to be applied in above-mentioned game application carry out video clipping scene in, or can with but not It is limited to be applied to carry out in above-mentioned multimedia application in the scene of video clipping, meets item to improve the determination from video resource The efficiency of the video clip of part.Above-mentioned is only a kind of example, does not do any restriction to this in the present embodiment.

Optionally, in the present embodiment, the determining device of above-mentioned video clip can be, but not limited to be executed by server, or Person can also be by client executing, or can also be executed by server and client side's interaction.

As it can be seen that passing through be used to indicate characteristics of image first to the video clip in video resource by above-mentioned apparatus Feature and be used to indicate motion feature second feature extraction, according to fisrt feature and second feature from multiple video clips In determine the video clip for meeting target image characteristics and Target Motion Character, to realize the automatic acquisition of video clip While fully considered feature of the target video segment on image dimension and movement dimension, make it possible to more accurately really Surely meet the target video segment of condition, to realize the effect for improving the video clip for determining the condition that meets from video resource The technical effect of rate, and then solve and determine that the efficiency for the video clip for meeting condition is lower from video resource in the related technology The technical issues of.

Optionally, in the present embodiment, determining module includes: the first determination unit, for according to fisrt feature and second Feature determines the video classification of each video clip in multiple video clips；Second determination unit is used for multiple video clips Middle video classification is that the video clip of target category is determined as target video segment.

Optionally, in the present embodiment, the first determination unit includes: the first input subelement, for fisrt feature is defeated Enter image classification model, obtain the corresponding first category parameter of each video clip, wherein image classification model is using mark The model for having the fisrt feature sample of image category to be trained the first disaggregated model, first category parameter are used to indicate Image category belonging to each video clip；Second input subelement, for obtaining second feature input motion disaggregated model The corresponding second category parameter of each video clip, wherein classification of motions model is special using be labeled with sports category second The model that sign sample is trained the second disaggregated model, second category parameter are used to indicate belonging to each video clip Sports category；First determines subelement, for by the corresponding first category parameter of each video clip and second category parameter Weighted sum is determined as the video classification parameter of each video clip, wherein video classification parameter is used to indicate each video clip Affiliated video classification.

Optionally, in the present embodiment, the first determination unit includes: fusion subelement, for each video clip Fisrt feature and second feature carry out Fusion Features, obtain the space-time characteristic of each video clip；Third inputs subelement, is used for The space-time characteristic of each video clip is inputted into space-time disaggregated model, obtains the corresponding video classification parameter of each video clip, Wherein, space-time disaggregated model is labeled with the other space-time characteristic sample of video class for use and is trained to obtain to third disaggregated model Model, video classification parameter is used to indicate video classification belonging to each video clip.

Optionally, in the present embodiment, the second determination unit includes: the first acquisition subelement, is used for from multiple piece of video The video clip that video classification parameter falls into targets threshold range is obtained in section；Second determines subelement, is used for video classification The video clip that parameter falls into targets threshold range is determined as target video segment.

Optionally, in the present embodiment, the second acquisition module includes: the first input unit, is used for each video clip It inputs fisrt feature and extracts model, obtain the fisrt feature of each video clip, wherein it is to use that fisrt feature, which extracts model, The model that one feature samples are trained initial fisrt feature model；Second input unit is used for each piece of video Section input light stream Feature Selection Model, obtains the Optical-flow Feature of each video clip, wherein it is to use that Optical-flow Feature, which extracts model, The model that Optical-flow Feature sample is trained initial light stream characteristic model, second feature include Optical-flow Feature.

Optionally, in the present embodiment, above-mentioned apparatus further include: third obtains module, for obtaining initial pictures convolution The average value of the corresponding first convolution layer parameter of the RGB triple channel of model, and average value is determined as initial light stream character modules The initiation parameter of first convolutional layer of type；Module is adjusted, the numerical value model for composing the light stream of initial light stream feature samples The numberical range for being adjusted to the input parameter of initial pictures convolution model is enclosed, Optical-flow Feature sample is obtained；Training module, for making Initial light stream characteristic model is trained with light stream feature samples, Optical-flow Feature is obtained and extracts model.

Optionally, in the present embodiment, multiple video clips are the video clip in video resource including target object, In, the first acquisition module includes: first acquisition unit, for obtaining the corresponding screen template of target object, wherein screen template It is used to indicate the attribute information of target object；Second acquisition unit, for being obtained and picture mould from the video frame of video resource The similarity of plate is higher than the target video frame of target similarity；Third acquiring unit, it is multiple for being obtained from target video frame Video clip, wherein include one or continuous multiple target video frames in each video clip in multiple video clips.

Optionally, in the present embodiment, second acquisition unit includes: the second acquisition subelement, for from video resource The first video frame is obtained in video frame, wherein the first attribute of the first video frame and the first attributes match of target object；Third Subelement is obtained, for obtaining the second video frame for being higher than target similarity with the similarity of screen template from the first video frame As target video frame.

Optionally, in the present embodiment, second acquisition unit includes: division subelement, for by the video of video resource Each video frame in frame is divided into foreground picture and background frame；Third determines subelement, for determining each video respectively First distance between the foreground picture and foreground template of frame and between the background frame and background template of each video frame Second distance, wherein screen template includes foreground template and background template；4th determines subelement, for determining each video The first distance of frame and the weighted sum of second distance, wherein corresponding first weight of first distance, corresponding second power of second distance Weight, the first weight are greater than the second weight；5th determines subelement, corresponding for video frame each in the video frame by video resource Weighted sum be determined as target video frame lower than the video frame of target value.

Optionally, in the present embodiment, target video segment is multiple target video segments, wherein above-mentioned apparatus is also wrapped It includes: splicing module, for multiple target video segments to be spliced into target video resource sequentially in time；Sending module is used It is used to play the client of target video resource in target video resource to be sent to.

The application environment of the embodiment of the present invention can be, but not limited to referring to the application environment in above-described embodiment, the present embodiment In this is repeated no more.The embodiment of the invention provides the optional tools of one kind of the connection method for implementing above-mentioned real time communication Body application example.

As a kind of optional embodiment, the determination method of above-mentioned video clip can be, but not limited to be applied to such as Figure 11 institute In the scene of the wonderful editing for the gunbattle class game shown.When automating editing gunbattle class game video wonderful, AI Possible wonderful can be generated according to detection is killed, then, to each possible wonderful, TSN network can basis Characteristics of image and Optical-flow Feature obtain the excellent marking of each segment in conjunction with the label data of offline sample.Final realize uses The method of machine learning automates editing gunbattle class game video wonderful.

In gunbattle class game video, the most of the time is in the exploration for carrying out map, and wonderful therein is very Few a part.There is the game video of many gunbattle class game by taking the existential mode of gunbattle class game as an example, on network, wherein The wonderful collection of choice specimens attracted the attention of large quantities of online friends.These wonderfuls of artificial selection can take considerable time and energy. In this scene, above-mentioned excellent degree evaluation method is proposed in order to solve the automatic editing of wonderful.It, will by the method According to image and Optical-flow Feature, the excellent degree marking of each segment is exported, to realize the automatic of gunbattle class game video segment Editing.

The above-mentioned implementation method of gunbattle class game video wonderful editing can allow game video editing to realize automation, And the wonderful exported meets human cognitive, improves the efficiency of game segment editing.

Another aspect according to an embodiment of the present invention additionally provides a kind of for implementing the determination of above-mentioned video clip Electronic device, as shown in figure 12, the electronic device include: one or more (one is only shown in figure) processors 1202, storage Device 1204, sensor 1206, encoder 1208 and transmitting device 1210 are stored with computer program in the memory, at this Reason device is arranged to execute the step in any of the above-described embodiment of the method by computer program.

Optionally, in the present embodiment, above-mentioned electronic device can be located in multiple network equipments of computer network At least one network equipment.

Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:

S1 obtains multiple video clips from video resource；

S2 obtains the fisrt feature and second feature of each video clip in multiple video clips, wherein fisrt feature is used In the characteristics of image for indicating object included in each video clip, second feature is used to indicate in each video clip and is wrapped The motion feature of the object included；

S3 determines target video segment, wherein mesh according to fisrt feature and second feature in multiple video clips Marking video clip is the piece of video that object included in multiple video clips has target image characteristics and Target Motion Character Section.

Optionally, it will appreciated by the skilled person that structure shown in Figure 12 is only to illustrate, electronic device can also To be smart phone (such as Android phone, iOS mobile phone), tablet computer, palm PC and mobile internet device The terminal devices such as (Mobile Internet Devices, MID), PAD.Figure 12 it does not make to the structure of above-mentioned electronic device At restriction.For example, electronic device may also include more or less component (such as network interface, display dress than shown in Figure 12 Set), or with the configuration different from shown in Figure 12.

Wherein, memory 1202 can be used for storing software program and module, such as the video clip in the embodiment of the present invention The corresponding program instruction/module of determination method and apparatus, processor 1204 by operation be stored in it is soft in memory 1202 Part program and module realize the controlling party of above-mentioned target element thereby executing various function application and data processing Method.Memory 1202 may include high speed random access memory, can also include nonvolatile memory, such as one or more magnetism Storage device, flash memory or other non-volatile solid state memories.In some instances, memory 1202 can further comprise The memory remotely located relative to processor 1204, these remote memories can pass through network connection to terminal.Above-mentioned net The example of network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.

Above-mentioned transmitting device 1210 is used to that data to be received or sent via a network.Above-mentioned network specific example It may include cable network and wireless network.In an example, transmitting device 1210 includes a network adapter (Network Interface Controller, NIC), can be connected by cable with other network equipments with router so as to interconnection Net or local area network are communicated.In an example, transmitting device 1210 is radio frequency (Radio Frequency, RF) module, For wirelessly being communicated with internet.

Wherein, specifically, memory 1202 is for storing application program.

The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.

Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps Calculation machine program:

S1 obtains multiple video clips from video resource；

Optionally, storage medium is also configured to store for executing step included in the method in above-described embodiment Computer program, this is repeated no more in the present embodiment.

Optionally, in the present embodiment, those of ordinary skill in the art will appreciate that in the various methods of above-described embodiment All or part of the steps be that the relevant hardware of terminal device can be instructed to complete by program, the program can store in In one computer readable storage medium, storage medium may include: flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random Access Memory, RAM), disk or CD etc..

The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.

If the integrated unit in above-described embodiment is realized in the form of SFU software functional unit and as independent product When selling or using, it can store in above-mentioned computer-readable storage medium.Based on this understanding, skill of the invention Substantially all or part of the part that contributes to existing technology or the technical solution can be with soft in other words for art scheme The form of part product embodies, which is stored in a storage medium, including some instructions are used so that one Platform or multiple stage computers equipment (can be personal computer, server or network equipment etc.) execute each embodiment institute of the present invention State all or part of the steps of method.

In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment The part of detailed description, reference can be made to the related descriptions of other embodiments.

In several embodiments provided herein, it should be understood that disclosed client, it can be by others side Formula is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, and only one Kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module It connects, can be electrical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of determination method of video clip characterized by comprising

Multiple video clips are obtained from video resource；

Obtain the fisrt feature and second feature of each video clip in the multiple video clip, wherein the fisrt feature It is used to indicate the characteristics of image of object included in each video clip, the second feature is used to indicate described each The motion feature of included object in video clip；

According to the fisrt feature and the second feature, target video segment is determined in the multiple video clip, In, the target video segment is that there are object included in the multiple video clip target image characteristics and target to move The video clip of feature.

2. the method according to claim 1, wherein according to the fisrt feature and the second feature, in institute It states and determines that target video segment includes: in multiple video clips

The video class of each video clip in the multiple video clip is determined according to the fisrt feature and the second feature Not；

The video clip that video classification described in the multiple video clip is target category is determined as the target video piece Section.

3. according to the method described in claim 2, it is characterized in that, determining institute according to the fisrt feature and the second feature The video classification for stating each video clip in multiple video clips includes:

By the fisrt feature input picture disaggregated model, the corresponding first category parameter of each video clip is obtained, In, described image disaggregated model is to be trained using the fisrt feature sample for being labeled with image category to the first disaggregated model The model arrived, the first category parameter are used to indicate image category belonging to each video clip；

By the second feature input motion disaggregated model, the corresponding second category parameter of each video clip is obtained, In, the classification of motions model is to be trained using the second feature sample for being labeled with sports category to the second disaggregated model The model arrived, the second category parameter are used to indicate sports category belonging to each video clip；

The weighted sum of each corresponding first category parameter of video clip and the second category parameter is determined as The video classification parameter of each video clip, wherein the video classification parameter is used to indicate each video clip Affiliated video classification.

4. according to the method described in claim 2, it is characterized in that, determining institute according to the fisrt feature and the second feature The video classification for stating each video clip in multiple video clips includes:

The fisrt feature and the second feature to each video clip carry out Fusion Features, obtain each view The space-time characteristic of frequency segment；

The space-time characteristic of each video clip is inputted into space-time disaggregated model, it is corresponding to obtain each video clip Video classification parameter, wherein the space-time disaggregated model be use be labeled with the other space-time characteristic sample of video class to third The model that disaggregated model is trained, the video classification parameter are used to indicate video belonging to each video clip Classification.

5. the method according to claim 3 or 4, which is characterized in that by video classification described in the multiple video clip It is determined as the target video segment for the video clip of target category and includes:

The video clip that the video classification parameter falls into targets threshold range is obtained from the multiple video clip；

The video clip that the video classification parameter falls into targets threshold range is determined as the target video segment.

6. the method according to claim 1, wherein obtaining each video clip in the multiple video clip Fisrt feature and second feature include:

Each video clip input fisrt feature is extracted into model, obtain each video clip described first is special Sign, wherein it is to be trained to obtain to initial fisrt feature model using fisrt feature sample that the fisrt feature, which extracts model, Model；

Each video clip is inputted into light stream Feature Selection Model, obtains the Optical-flow Feature of each video clip, In, it is the mould being trained using Optical-flow Feature sample to initial light stream characteristic model that the Optical-flow Feature, which extracts model, Type, the second feature include the Optical-flow Feature.

7. according to the method described in claim 6, it is characterized in that, each video clip is inputted light stream feature extraction Model, before obtaining the Optical-flow Feature of each video clip, the method also includes:

The average value of the corresponding first convolution layer parameter of RGB triple channel of initial pictures convolution model is obtained, and will be described flat Mean value is determined as the initiation parameter of first convolutional layer of the initial light stream characteristic model；

The numberical range that the light stream of initial light stream feature samples is composed is adjusted to the input parameter of the initial pictures convolution model Numberical range, obtain the Optical-flow Feature sample；

The initial light stream characteristic model is trained using the Optical-flow Feature sample, the Optical-flow Feature is obtained and extracts mould Type.

8. the method according to claim 1, wherein the multiple video clip be the video resource in include The video clip of target object, wherein the multiple video clip is obtained from the video resource includes:

Obtain the corresponding screen template of the target object, wherein the screen template is used to indicate the category of the target object Property information；

The target for being higher than target similarity with the similarity of the screen template is obtained from the video frame of the video resource to regard Frequency frame；

The multiple video clip is obtained from the target video frame, wherein each video in the multiple video clip It include one or continuous multiple target video frames in segment.

9. according to the method described in claim 8, it is characterized in that, being obtained and the picture from the video frame of the video resource The similarity of face die plate be higher than target similarity target video frame include:

The first video frame is obtained from the video frame of the video resource, wherein the first attribute of first video frame and institute State the first attributes match of target object；

The second video for being higher than the target similarity with the similarity of the screen template is obtained from first video frame Frame is as the target video frame.

10. according to the method described in claim 8, it is characterized in that, from the video frame of the video resource obtain with it is described The similarity of screen template be higher than target similarity target video frame include:

Each video frame in the video frame of the video resource is divided into foreground picture and background frame；

First distance between the foreground picture and foreground template of each video frame and described each is determined respectively Second distance between the background frame and background template of video frame, wherein the screen template includes the prospect mould Plate and the background template；

Determine the first distance of each video frame and the weighted sum of the second distance, wherein the first distance Corresponding first weight, corresponding second weight of the second distance, first weight are greater than second weight；

Video frame by the corresponding weighted sum of video frame each in the video frame of the video resource lower than target value determines For the target video frame.

11. the method according to claim 1, wherein the target video segment be multiple target video segments, Wherein, according to the fisrt feature and the second feature, target video segment is determined in the multiple video clip Later, the method also includes:

The multiple target video segment is spliced into target video resource sequentially in time；

The target video resource is sent to the client for being used to play the target video resource.

12. a kind of determining device of video clip characterized by comprising

First obtains module, for obtaining multiple video clips from video resource；

Second obtains module, for obtaining the fisrt feature and second feature of each video clip in the multiple video clip, Wherein, the fisrt feature is used to indicate the characteristics of image of object included in each video clip, and described second is special Sign is used to indicate the motion feature of object included in each video clip；

Determining module, for determining mesh in the multiple video clip according to the fisrt feature and the second feature Mark video clip, wherein the target video segment is that object included in the multiple video clip has target image The video clip of feature and Target Motion Character.

13. device according to claim 12, which is characterized in that determining module includes:

First determination unit, it is each in the multiple video clip for being determined according to the fisrt feature and the second feature The video classification of video clip；

Second determination unit, for determining the video clip that video classification described in the multiple video clip is target category For the target video segment.

14. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer Program is arranged to execute method described in any one of claim 1 to 11 when operation.

15. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory Sequence, the processor are arranged to execute side described in any one of claim 1 to 11 by the computer program Method.