CN110166827A - Determination method, apparatus, storage medium and the electronic device of video clip - Google Patents
Determination method, apparatus, storage medium and the electronic device of video clip Download PDFInfo
- Publication number
- CN110166827A CN110166827A CN201811427035.8A CN201811427035A CN110166827A CN 110166827 A CN110166827 A CN 110166827A CN 201811427035 A CN201811427035 A CN 201811427035A CN 110166827 A CN110166827 A CN 110166827A
- Authority
- CN
- China
- Prior art keywords
- video
- feature
- video clip
- target
- clip
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
Abstract
The invention discloses determination method, apparatus, storage medium and the electronic devices of a kind of video clip.Wherein, this method comprises: obtaining multiple video clips from video resource;Obtain the fisrt feature and second feature of each video clip in the multiple video clip, wherein, the fisrt feature is used to indicate the characteristics of image of object included in each video clip, and the second feature is used to indicate the motion feature of object included in each video clip;According to the fisrt feature and the second feature, target video segment is determined in the multiple video clip, wherein, the target video segment is the video clip that object included in the multiple video clip has target image characteristics and Target Motion Character.The present invention solves the technical problem for determining that the efficiency for the video clip for meeting condition is lower from video resource in the related technology.
Description
Technical field
The present invention relates to computer fields, are situated between in particular to a kind of determination method, apparatus of video clip, storage
Matter and electronic device.
Background technique
By part excellent in video resource or spectators may more interested partial clip come out and be supplied to user, energy
More users are enough attracted to pay close attention to.The mode of editing video resource thinks him from video generally by staff at present
For excellent partial clip into a video, this mode editing speed is slow, while wasting time with energy, the standard of editing
Also it is difficult to grasp, carrys out editing video with the cognition of staff entirely, can not accurately grasps the demand of user, lead to video clipping
Efficiency is very low.
For above-mentioned problem, currently no effective solution has been proposed.
Summary of the invention
The embodiment of the invention provides determination method, apparatus, storage medium and the electronic devices of a kind of video clip, so that
Few technical problem for solving to determine that the efficiency for the video clip for meeting condition is lower from video resource in the related technology.
According to an aspect of an embodiment of the present invention, a kind of determination method of video clip is provided, comprising: provide from video
Multiple video clips are obtained in source;The fisrt feature and second feature of each video clip in the multiple video clip are obtained,
Wherein, the fisrt feature is used to indicate the characteristics of image of object included in each video clip, and described second is special
Sign is used to indicate the motion feature of object included in each video clip;According to the fisrt feature and described second
Feature determines target video segment in the multiple video clip, wherein the target video segment is the multiple view
Included object has the video clip of target image characteristics and Target Motion Character in frequency segment.
According to another aspect of an embodiment of the present invention, a kind of determining device of video clip is additionally provided, comprising: first obtains
Modulus block, for obtaining multiple video clips from video resource;Second obtains module, for obtaining the multiple video clip
In each video clip fisrt feature and second feature, wherein the fisrt feature is used to indicate each video clip
In included object characteristics of image, the second feature is used to indicate object included in each video clip
Motion feature;Determining module, for being determined in the multiple video clip according to the fisrt feature and the second feature
Target video segment out, wherein the target video segment is that object included in the multiple video clip has target
The video clip of characteristics of image and Target Motion Character.
Optionally, the first determination unit includes: the first input subelement, for the fisrt feature input picture to be classified
Model obtains the corresponding first category parameter of each video clip, wherein described image disaggregated model is that use is labeled with
The model that the fisrt feature sample of image category is trained the first disaggregated model, the first category parameter is for referring to
Show image category belonging to each video clip;Second input subelement, for dividing the second feature input motion
Class model obtains the corresponding second category parameter of each video clip, wherein the classification of motions model is using mark
The model for having the second feature sample of sports category to be trained the second disaggregated model, the second category parameter are used for
Indicate sports category belonging to each video clip;First determines subelement, for each video clip is corresponding
The first category parameter and the weighted sum of the second category parameter be determined as the video classification of each video clip
Parameter, wherein the video classification parameter is used to indicate video classification belonging to each video clip.
Optionally, the first determination unit includes: fusion subelement, special for described first to each video clip
The second feature of seeking peace carries out Fusion Features, obtains the space-time characteristic of each video clip;Third inputs subelement, uses
In the space-time characteristic of each video clip is inputted space-time disaggregated model, it is corresponding to obtain each video clip
Video classification parameter, wherein the space-time disaggregated model is that use is labeled with the other space-time characteristic sample of video class to third point
The model that class model is trained, the video classification parameter are used to indicate video class belonging to each video clip
Not.
Optionally, the second determination unit includes: the first acquisition subelement, for obtaining institute from the multiple video clip
State the video clip that video classification parameter falls into targets threshold range;Second determines subelement, for joining the video classification
The video clip scolded into targets threshold range is determined as the target video segment.
Optionally, the second acquisition module includes: the first input unit, for each video clip input first is special
Sign extracts model, obtains the fisrt feature of each video clip, wherein it is to use that the fisrt feature, which extracts model,
The model that fisrt feature sample is trained initial fisrt feature model;Second input unit, being used for will be described each
Video clip inputs light stream Feature Selection Model, obtains the Optical-flow Feature of each video clip, wherein the Optical-flow Feature
Extracting model is the model being trained using Optical-flow Feature sample to initial light stream characteristic model, the second feature packet
Include the Optical-flow Feature.
Optionally, described device further include: third obtains module, for obtaining the RGB threeway of initial pictures convolution model
The average value of the corresponding first convolution layer parameter in road, and the average value is determined as the of the initial light stream characteristic model
The initiation parameter of one convolutional layer;Module is adjusted, for adjusting the numberical range of the light stream spectrum of initial light stream feature samples
For the numberical range of the input parameter of the initial pictures convolution model, the Optical-flow Feature sample is obtained;Training module is used for
The initial light stream characteristic model is trained using the Optical-flow Feature sample, the Optical-flow Feature is obtained and extracts model.
Optionally, the multiple video clip is the video clip in the video resource including target object, wherein the
One acquisition module includes: first acquisition unit, for obtaining the corresponding screen template of the target object, wherein the picture
Template is used to indicate the attribute information of the target object;Second acquisition unit, for from the video frame of the video resource
Obtain the target video frame for being higher than target similarity with the similarity of the screen template;Third acquiring unit is used for from described
The multiple video clip is obtained in target video frame, wherein include in each video clip in the multiple video clip
One or continuous multiple target video frames.
Optionally, second acquisition unit includes: the second acquisition subelement, for obtaining from the video frame of the video resource
Take the first video frame, wherein the first attribute of first video frame and the first attributes match of the target object;Third obtains
Subelement is taken, is higher than the target similarity with the similarity of the screen template for obtaining from first video frame
Second video frame is as the target video frame.
Optionally, second acquisition unit includes: division subelement, for by each of the video frame of the video resource
Video frame is divided into foreground picture and background frame;Third determines subelement, for determining the institute of each video frame respectively
State the background frame and background template of the first distance and each video frame between foreground picture and foreground template
Between second distance, wherein the screen template includes the foreground template and the background template;4th determines that son is single
Member, for determining the first distance of each video frame and the weighted sum of the second distance, wherein described first away from
From corresponding first weight, corresponding second weight of the second distance, first weight is greater than second weight;5th determines
Subelement, for the corresponding weighted sum of video frame each in the video frame of the video resource to be lower than to the video of target value
Frame is determined as the target video frame.
Optionally, the target video segment is multiple target video segments, wherein described device further include: splicing mould
Block, for the multiple target video segment to be spliced into target video resource sequentially in time;Sending module is used for institute
It states target video resource and is sent to client for playing the target video resource.
According to another aspect of an embodiment of the present invention, a kind of storage medium is additionally provided, which is characterized in that the storage is situated between
Computer program is stored in matter, wherein the computer program is arranged to execute described in any of the above-described when operation
Method.
According to another aspect of an embodiment of the present invention, a kind of electronic device, including memory and processor are additionally provided,
It is characterized in that, computer program is stored in the memory, and the processor is arranged to hold by the computer program
Method described in row any of the above-described.
In embodiments of the present invention, using obtaining multiple video clips from video resource;Obtain the multiple piece of video
The fisrt feature and second feature of each video clip in section, wherein the fisrt feature is used to indicate each piece of video
The characteristics of image of included object in section, the second feature are used to indicate object included in each video clip
Motion feature;According to the fisrt feature and the second feature, target video is determined in the multiple video clip
Segment, wherein the target video segment be object included in the multiple video clip have target image characteristics and
The mode of the video clip of Target Motion Character passes through be used to indicate characteristics of image to the video clip in video resource
One feature and be used to indicate motion feature second feature extraction, according to fisrt feature and second feature from multiple piece of video
The video clip for meeting target image characteristics and Target Motion Character is determined in section, to realize obtaining automatically for video clip
Feature of the target video segment on image dimension and movement dimension has been fully considered while taking, and has been made it possible to more accurately
It determines the target video segment for the condition that meets, improves the video clip for determining the condition that meets from video resource to realize
The technical effect of efficiency, so solve in the related technology from video resource determine meet condition video clip efficiency compared with
Low technical problem.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is a kind of schematic diagram of the determination method of optional video clip according to an embodiment of the present invention;
Fig. 2 is a kind of application environment schematic diagram of the determination method of optional video clip according to an embodiment of the present invention;
Fig. 3 is a kind of schematic diagram of the determination method of optional video clip of optional embodiment according to the present invention;
Fig. 4 is a kind of schematic diagram of optional TSN network model of optional embodiment according to the present invention;
Fig. 5 is a kind of schematic diagram of optional Inception model structure of optional embodiment according to the present invention;
Fig. 6 is a kind of schematic diagram of optional convolutional network model structure of optional embodiment according to the present invention;
Fig. 7 is a kind of signal of optional automatic clipping method of video clip of optional embodiment according to the present invention
Figure;
Fig. 8 is a kind of signal of the acquisition methods of optional multiple video clips of optional embodiment according to the present invention
Figure;
Fig. 9 is showing for the acquisition methods of the optional multiple video clips of another kind of optional embodiment according to the present invention
It is intended to;
Figure 10 is a kind of schematic diagram of the determining device of optional video clip according to an embodiment of the present invention;
Figure 11 is a kind of application scenarios schematic diagram of the determination method of optional video clip according to an embodiment of the present invention;
And
Figure 12 is a kind of schematic diagram of optional electronic device according to an embodiment of the present invention.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work
It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or
Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to
Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product
Or other step or units that equipment is intrinsic.
According to an aspect of an embodiment of the present invention, a kind of determination method of video clip is provided, as shown in Figure 1, should
Method includes:
S102 obtains multiple video clips from video resource;
S104 obtains the fisrt feature and second feature of each video clip in multiple video clips, wherein fisrt feature
It is used to indicate the characteristics of image of object included in each video clip, second feature is used to indicate institute in each video clip
Including object motion feature;
S106 determines target video segment according to fisrt feature and second feature in multiple video clips, wherein
Target video segment is the video that object included in multiple video clips has target image characteristics and Target Motion Character
Segment.
Optionally, in the present embodiment, the determination method of above-mentioned video clip can be applied to server as shown in Figure 2
202 and the hardware environment that is constituted of client 204 in.As shown in Fig. 2, server 202 obtains multiple videos from video resource
Segment;Obtain the fisrt feature and second feature of each video clip in multiple video clips, wherein fisrt feature is used to indicate
The characteristics of image of included object in each video clip, second feature are used to indicate pair included in each video clip
The motion feature of elephant;According to fisrt feature and second feature, target video segment is determined in multiple video clips, wherein
Target video segment is the video that object included in multiple video clips has target image characteristics and Target Motion Character
Segment.
Optionally, in the present embodiment, server 202 can send one or more obtained target video segment
To client 204, one or more target video segment received is shown by client 204.Server 202 can also be
In the case that target video segment is multiple target video segments, multiple target video fragment assemblies are provided at a target video
Source, then client 204 will be sent to target video resource.Client 204, which shows the target video resource received, is shielding
On curtain.
Optionally, in the present embodiment, the determination method of above-mentioned video clip can be, but not limited to be applied to video clipping
Scene in.Wherein, above-mentioned client can be, but not limited to as various types of applications, for example, online education application, Instant Messenger
Interrogate application, community space application, game application, shopping application, browser application, financial application, multimedia application, live streaming application
Deng.Specifically, can be, but not limited to be applied in above-mentioned game application carry out video clipping scene in, or can with but not
It is limited to be applied to carry out in above-mentioned multimedia application in the scene of video clipping, meets item to improve the determination from video resource
The efficiency of the video clip of part.Above-mentioned is only a kind of example, does not do any restriction to this in the present embodiment.
Optionally, in the present embodiment, the determination method of above-mentioned video clip can be, but not limited to be executed by server, or
Person can also be by client executing, or can also be executed by server and client side's interaction.
Optionally, in the present embodiment, video resource can be, but not limited to include video website, the view in video player
Frequency resource, such as: movie and television play, animation, variety video file.It can also include game live streaming, sports show, live streaming application etc.
Deng live video stream, such as: sports event live broadcast, competition game live streaming, TV programme live streaming etc..Alternatively, can also wrap
Include video, such as game video etc. obtained in the use process of client.
Optionally, in the present embodiment, video clip can be, but not limited to meet certain item in above-mentioned video resource
The segment of part.Such as: movie and television play, animation, the performance segment that certain performer or singer are determined in variety video file.It can be with
Including the Highlight etc. in live video stream, such as: the video clip for Highlight of competing in sports event live broadcast, or
Goal segment collection of choice specimens of certain soccer star etc..It either can also include wonderful or the useful content in client use process
Segment etc., such as: kill segment, teaching segment of office software use process etc. in game.
Optionally, in the present embodiment, fisrt feature is used to indicate the image of object included in each video clip
Feature.Such as: characteristics of image can be, but not limited to include the spatial relationship between color characteristic, textural characteristics, shape feature, object
Feature etc..
Optionally, in the present embodiment, second feature is used to indicate object included in each video clip
Motion feature.Such as: motion feature can be, but not limited to include Optical-flow Feature, feature of motion amplitude for indicating object etc.
Deng.
Optionally, in the present embodiment, target image characteristics and Target Motion Character can be, but not limited to be according to mesh
The condition that the target video segment that the demand of mark video clip determines needs to meet.Target image characteristics and Target Motion Character can
With but be not limited by configuration and obtain, or the analysis of historical data can also be automatically generated by smart network.Than
Such as: by taking basketball soccer star's goal collection of choice specimens as an example, the history goal segment of the soccer star can be input in intelligent algorithm, by calculating
Method, which automatically identifies target image characteristics, may include the facial characteristics of the soccer star, the feature of basketry, feature of basketball etc., mesh
The movement range that mark motion feature may include the soccer star enters basketry etc. more than target value, basketball.
Optionally, in the present embodiment, it can determine that the video clip that fisrt feature is target image characteristics is with mesh
The video clip of logo image feature, second feature are that the video clip of Target Motion Character is the video with Target Motion Character
Segment.Or it can also determine that the similarity of fisrt feature and target image characteristics reaches the video clip of certain value for target
The video clip that the similarity of the video clip of characteristics of image, second feature and Target Motion Character reaches certain value is with target
The video clip of characteristics of image.It should be noted that determining whether object included in video clip has target image special
Target Motion Character of seeking peace is without being limited thereto.
In an optional embodiment, it is to eliminate the determination process of the wonderful of enemy in gunbattle class game
Example, as shown in figure 3, obtaining multiple video clips (segment 1, segment 2, segment 3) from game video, can be and regard the game
Frequency be divided into multiple video clips or be also possible to extract from game video may be excellent superseded camera lens segment, obtain
The fisrt feature and second feature of each video clip in multiple video clips are taken, such as: 1 character pair 1a of segment and feature
2a, segment 2 character pair 1b and feature 2b, segment 3 character pair 1c and feature 2c are special according to above-mentioned fisrt feature and second
Sign, determines that feature 1a, feature 1b and feature 1c and the similarity of target image characteristics 1 are eligible, determines multiple piece of video
Section all has target image characteristics, but the similarity of feature 2a and Target Motion Character 2 is unsatisfactory for condition, feature 2b and feature 2c
Meet condition with the similarity of Target Motion Character 2, then determines that target video segment is 2 He of segment in multiple video clips
Segment 3.
As it can be seen that through the above steps, passing through be used to indicate characteristics of image first to the video clip in video resource
Feature and be used to indicate motion feature second feature extraction, according to fisrt feature and second feature from multiple video clips
In determine the video clip for meeting target image characteristics and Target Motion Character, to realize the automatic acquisition of video clip
While fully considered feature of the target video segment on image dimension and movement dimension, make it possible to more accurately really
Surely meet the target video segment of condition, to realize the effect for improving the video clip for determining the condition that meets from video resource
The technical effect of rate, and then solve and determine that the efficiency for the video clip for meeting condition is lower from video resource in the related technology
The technical issues of.
Target is determined in multiple video clips according to fisrt feature and second feature as a kind of optional scheme
Video clip includes:
S1 determines the video classification of each video clip in multiple video clips according to fisrt feature and second feature;
The video clip that video classification in multiple video clips is target category is determined as target video segment by S2.
It optionally, in the present embodiment, can be according to fisrt feature and second during determining target video segment
Feature classifies to multiple video clips, then the video clip for belonging to target category is determined as target video segment.
Optionally, in the present embodiment, the mode classified to multiple video clips can be, but not limited to include using
The mode of machine learning determines the classification of each video clip using sorter network model.Such as: utilize sorter network by video
Segment is divided into two classes: the video clip for wherein belonging to wonderful is determined as target video piece by wonderful and common segment
Section.
As a kind of optional scheme, each piece of video in multiple video clips is determined according to fisrt feature and second feature
Section video classification include:
Fisrt feature input picture disaggregated model is obtained the corresponding first category parameter of each video clip by S1,
In, image classification model is to be trained using the fisrt feature sample for being labeled with image category to the first disaggregated model
Model, first category parameter are used to indicate image category belonging to each video clip;
Second feature input motion disaggregated model is obtained the corresponding second category parameter of each video clip by S2,
In, classification of motions model is to be trained using the second feature sample for being labeled with sports category to the second disaggregated model
Model, second category parameter are used to indicate sports category belonging to each video clip;
The weighted sum of the corresponding first category parameter of each video clip and second category parameter is determined as each view by S3
The video classification parameter of frequency segment, wherein video classification parameter is used to indicate video classification belonging to each video clip.
Optionally, in the present embodiment, it can be, but not limited to training image characteristic model to be used to according to characteristics of image to view
Frequency segment is classified, and training motion feature model is used to classify to video clip according to motion feature, then by two kinds points
Class carries out the final classification that video clip is determined in integration.
Optionally, in the present embodiment, first category parameter and second category parameter can be, but not limited to as a classification
Probability, such as: video clip is divided into two classes, excellent and common, model output classification parameter be used to indicate to belong to excellent class
The video clip that this probability is higher than predetermined value can be determined as featured videos segment by the probability of video clip.
Optionally, in the present embodiment, first category parameter and second category parameter can be, but not limited to the mark for classification
Label.Such as: 0 indicates ordinary video, and 1 indicates featured videos etc..
Optionally, in the present embodiment, during determining the weighted sum of first category parameter and second category parameter,
Weighted value can be distributed according to characteristics of image and motion feature significance level other for video class.Such as: motion feature pair
Video class is other to be affected, then can distribute biggish weighted value 0.7 for motion feature, distribute lesser power for characteristics of image
Weight values 0.3.
In an optional embodiment, multiple video clips are input in a TSN network, when TSN network is
Space division cuts network Temporal Segment Network, is to combine characteristics of image and Optical-flow Feature to realize video
A kind of new way of the end-to-end study of content analysis.TSN network is extracted by image convolution network and light stream convolutional network
Space-time characteristic is given a mark respectively, and the excellent degree marking of video clip is exported after score is merged.Finally, output marking is higher
Video clip is as target video segment.
As shown in figure 4, TSN network is intended to execute prediction video tab using the visual information of entire video, by image
Convolutional network and light stream convolutional network composition.Space-time dividing network does not handle single frame or frame storehouse instead of, to entire video
The short-movie section sequence of middle sparse sampling is operated.Each segment in this sequence will generate the tentative prediction of video tab.
Then, final prediction result is generated after segment prediction fusion.In learning process, model parameter can be updated by iteration come excellent
Change the penalty values of prediction.In form, give a video, it be divided into K identical duration segment S1, S2 ...,
SK }, then, space-time dividing network passes through following model prediction video tab:
TSN(T1,T2,...,TK)=H (G (F (T1,W),F(T2,W),...,F(TK,W)))
Wherein, (T1, T2 ..., TK) is a series of small fragments, and for image convolution network, Tk is the stochastical sampling from Sk
A frame image, for light stream convolutional network, Tk is the continuous light stream image of five frames of the stochastical sampling from Sk.F (Tk, W) table
Show the classification marking that the convolutional network of parameter W obtains after inputting Tk, G-function carries out the classification marking that different fragments obtain whole
It closes, takes the method averaged in experiment.H function is the posterior probability for seeking each classification, for result is normalized,
Using the method for softmax, in conjunction with entropy loss is intersected, the loss of TSN can be indicated are as follows:
Wherein, C is classification number, and yi is the true tag of the i-th class.It can be joined using stochastic gradient descent method Optimized model
Number.
In order to extract the stronger middle level features of ability to express, convolutional network has used Inception model, this and be coupled
Structure can extract multiple dimensioned abstract characteristics, thus the ability to express of lifting feature.Inception model uses parallel-connection structure, can
To extract Analysis On Multi-scale Features, wherein two kinds of structures of Inception are as shown in Figure 5.Above-mentioned image convolution network and light stream convolution
The overall structure of network can be as shown in Figure 6.Light stream convolutional network is similar with image convolution network, only first convolutional layer
Input channel is changed to 5 by 3.
After obtaining trained model, 25 picture frames of same time spacing can be intercepted from the video clip of test
It is composed with corresponding light stream, respectively input picture network and light stream network, obtains classification marking.Finally, specified not to the marking of two classes
Same weight, selects the highest classification of score as final label after weighted sum.
As a kind of optional scheme, each piece of video in multiple video clips is determined according to fisrt feature and second feature
Section video classification include:
S1, fisrt feature and second feature to each video clip carry out Fusion Features, obtain each video clip
Space-time characteristic;
The space-time characteristic of each video clip is inputted space-time disaggregated model, obtains the corresponding view of each video clip by S2
Frequency classification parameter, wherein space-time disaggregated model is that use is labeled with the other space-time characteristic sample of video class to third disaggregated model
The model being trained, video classification parameter are used to indicate video classification belonging to each video clip.
Optionally, in the present embodiment, fisrt feature and second feature can be subjected to Fusion Features, after recycling fusion
Feature classify to video clip.
Optionally, in the present embodiment, space-time disaggregated model is that use is labeled with the other space-time characteristic sample pair of video class
The model that third disaggregated model is trained.
Optionally, in the present embodiment, it can be, but not limited in the following manner really according to obtained video classification parameter
Set the goal video clip: the video clip that video classification parameter falls into targets threshold range is obtained from multiple video clips, it will
The video clip that video classification parameter falls into targets threshold range is determined as target video segment.
Optionally, in the present embodiment, 3D convolution and LSTM structure can be used to utilize space time correlation to determine video
The video classification of segment.
In an optional embodiment, by taking game video wonderful editing as an example, as shown in fig. 7, one section of input
Game video firstly generates possible wonderful, then extracts feature, warp by image convolution network and light stream convolutional network
The excellent marking output wonderful of each video clip is exported as target video segment after crossing Fusion Features.
As a kind of optional scheme, the fisrt feature and second feature of each video clip in multiple video clips are obtained
Include:
Each video clip input fisrt feature is extracted model, obtains the fisrt feature of each video clip by S1,
In, it is the model being trained using fisrt feature sample to initial fisrt feature model that fisrt feature, which extracts model,;
Each video clip is inputted light stream Feature Selection Model, obtains the Optical-flow Feature of each video clip by S2,
In, it is the model being trained using Optical-flow Feature sample to initial light stream characteristic model that Optical-flow Feature, which extracts model, the
Two features include Optical-flow Feature.
Optionally, in the present embodiment, can be respectively trained fisrt feature for extracting fisrt feature extract model and
Optical-flow Feature for extracting second feature extracts model.
Optionally, in the present embodiment, it can be, but not limited to that Optical-flow Feature is trained to extract model in the following manner: obtaining
The average value of the corresponding first convolution layer parameter of the RGB triple channel of initial pictures convolution model, and average value is determined as just
The initiation parameter of first convolutional layer of beginning Optical-flow Feature model, the numberical range that the light stream of initial light stream feature samples is composed
It is adjusted to the numberical range of the input parameter of initial pictures convolution model, obtains Optical-flow Feature sample, uses Optical-flow Feature sample
Initial light stream characteristic model is trained, Optical-flow Feature is obtained and extracts model.Aforesaid way can effectively avoid model training
Over-fitting.
Optionally, in the present embodiment, the numberical range of the input parameter of initial pictures convolution model can be, but not limited to
It is 0 to 255.
Optionally, in the present embodiment, few and incomplete in training samples number since above-mentioned each model parameter is excessive
In the case where be easy over-fitting, therefore, can adopt it is following prevent model over-fitting: mode 1 with three kinds of modes, use extensive several
Parameter initialization is done according to the model parameter of library (ImageNet) training, small parameter perturbations are done on this.For optical flow estimation, first
The numberical range that light stream is composed is adjusted between 0 to 255, is allowed to identical as the numberical range of image.Then, by RGB triple channel
The initiation parameter of first convolutional layer of the average value of corresponding first convolution layer parameter as optical flow estimation, other layers are protected
It holds constant.Mode 2, using block method for normalizing.The variance and mean value for limiting each layer of output, can accelerate the convergence of model and mention
The robustness of rising mould type.Mode 3 increases the quantity of training sample.It can be expanded using modes such as random cropping, overturnings in experiment
The quantity of training sample.
As a kind of optional scheme, multiple video clips are the video clip in video resource including target object,
In, multiple video clips are obtained from video resource includes:
S1 obtains the corresponding screen template of target object, wherein screen template is used to indicate the attribute letter of target object
Breath;
S2 obtains the target video for being higher than target similarity with the similarity of screen template from the video frame of video resource
Frame;
S3 obtains multiple video clips from target video frame, wherein in each video clip in multiple video clips
Including one or continuous multiple target video frames.
Optionally, in the present embodiment, in order to more efficiently determine target video segment, video can be provided
Video frame in source carries out preliminary screening, is screened out from it and meets certain condition the target view of (such as: including target object)
Frequency frame is as multiple video clips.
Optionally, in the present embodiment, target object can be, but not limited to include target scene, target person, object
Body, target text, target image etc..
Optionally, in the present embodiment, the video frame picture mould corresponding with target object in video resource can be used
Plate is matched, if meet matching condition, such as: similarity is higher than target similarity, it is determined that the video frame is target view
Frequency frame.
As a kind of optional scheme, is obtained from the video frame of video resource and be higher than target with the similarity of screen template
The target video frame of similarity includes:
S1 obtains the first video frame from the video frame of video resource, wherein the first attribute and target of the first video frame
First attributes match of object;
S2 obtains the second video frame conduct for being higher than target similarity with the similarity of screen template from the first video frame
Target video frame.
Optionally, in the present embodiment, in order to further increase treatment effeciency, it can use the first attribute of target object
It is matched with video frame, the video frame for meeting matching condition is screened and is carried out again as the first video frame with screen template
Matching, to obtain target video frame.
Optionally, in the present embodiment, the first attribute can be, but not limited to include color attribute, texture properties, shape category
Property etc..
In an optional embodiment, by taking the wonderful editing of gunbattle game as an example, as shown in figure 8, due to rifle
Highlight in war game is largely the video clip for killing enemy, gives one section of game video, first by killing inspection
It surveys and generates possible wonderful.By taking the existential mode of gunbattle game as an example, under this scheme, an opponent is often killed, schemed
As intermediate region will appear red " superseded " printed words.As shown in figure 9, can calculate first red in the rectangular area among image
The ratio of color, if ratio is higher than threshold value, it was initially believed that the image is the possible image for killing picture.It can use and be based on
The template matching method of sliding window detects " superseded ".Then, the Euclidean distance between image block and template is calculated, if being lower than threshold
Value, then detect and kill picture.Detecting red ratio is in order to reduce computation complexity, because the complexity of template matching is than meter
The complexity for calculating red ratio is high, and it is the necessary condition killed that red ratio, which is more than threshold value,.
As a kind of optional scheme, is obtained from the video frame of video resource and be higher than target with the similarity of screen template
The target video frame of similarity includes:
Each video frame in the video frame of video resource is divided into foreground picture and background frame by S1;
S2 determines the first distance between the foreground picture and foreground template of each video frame and each video respectively
Second distance between the background frame and background template of frame, wherein screen template includes foreground template and background template;
S3 determines the first distance of each video frame and the weighted sum of second distance, wherein corresponding first power of first distance
Weight, corresponding second weight of second distance, the first weight are greater than the second weight;
S4, the video frame by the corresponding weighted sum of video frame each in the video frame of video resource lower than target value are determined as
Target video frame.
Optionally, in the present embodiment, in order to improve the screening accuracy of target video frame, each video frame is divided into
Foreground picture and background frame, are matched with foreground template and background template respectively, respectively obtain matching result, then for for
The matching result of the relatively large foreground picture of influence of target video frame distributes biggish first weight, for for target video
The matching result of the relatively small background frame of influence of frame distributes lesser second weight, determines the weighted sum of matching result
As the similarity between video frame and screen template, to determine target video frame according to the similarity.
Optionally, in the present embodiment, above-mentioned matching result can be, but not limited to by between video frame and screen template
Distance indicate, such as: Euclidean distance, mahalanobis distance etc..It is higher apart from smaller expression similarity.
In the above-described embodiment, it can obtain through the above way and preferably kill detection effect, however, since game is carried on the back
Scape variation is sometimes more violent, if template matching can be interfered by containing part background in template.It therefore, can be first
Foreground mask (being equivalent to above-mentioned foreground template) is generated according to the color threshold of " superseded " printed words, it is European between feature calculating
Apart from when assign prospect greater weight, give the smaller weight of background.It can effectively detect in game by this method and kill picture.
As a kind of optional scheme, target video segment is multiple target video segments, wherein according to fisrt feature
And second feature, after target video segment is determined in multiple video clips, method further include:
Multiple target video segments are spliced into target video resource by S1 sequentially in time;
Target video resource is sent to the client for being used to play target video resource by S2.
Optionally, in the present embodiment, the above-mentioned multiple target video segments filtered out be can be for what user provided,
Can be target video fragment assembly is that one or more target video resource is supplied to user.
Optionally, in the present embodiment, video clip can be spliced sequentially in time, it can also be suitable according to other splicings
Sequence splices video clip.Such as: according to it is excellent degree marking height splice, splicing sequence can be, but not limited to be from high to low,
From low to high or from low to high high etc. sequence again.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because
According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention
It is necessary.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much
In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing
The part that technology contributes can be embodied in the form of software products, which is stored in a storage
In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate
Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Other side according to an embodiment of the present invention additionally provides a kind of for implementing the determination side of above-mentioned video clip
The determining device of the video clip of method, as shown in Figure 10, which includes:
First obtains module 102, for obtaining multiple video clips from video resource;
Second obtains module 104, for obtaining the fisrt feature of each video clip and the second spy in multiple video clips
Sign, wherein fisrt feature is used to indicate the characteristics of image of object included in each video clip, and second feature is used to indicate
The motion feature of included object in each video clip;
Determining module 106, for determining target video in multiple video clips according to fisrt feature and second feature
Segment, wherein target video segment is that there are object included in multiple video clips target image characteristics and target to move
The video clip of feature.
Optionally, in the present embodiment, the determining device of above-mentioned video clip can be applied to server as shown in Figure 2
202 and the hardware environment that is constituted of client 204 in.As shown in Fig. 2, server 202 obtains multiple videos from video resource
Segment;Obtain the fisrt feature and second feature of each video clip in multiple video clips, wherein fisrt feature is used to indicate
The characteristics of image of included object in each video clip, second feature are used to indicate pair included in each video clip
The motion feature of elephant;According to fisrt feature and second feature, target video segment is determined in multiple video clips, wherein
Target video segment is the video that object included in multiple video clips has target image characteristics and Target Motion Character
Segment.
Optionally, in the present embodiment, server 202 can send one or more obtained target video segment
To client 204, one or more target video segment received is shown by client 204.Server 202 can also be
In the case that target video segment is multiple target video segments, multiple target video fragment assemblies are provided at a target video
Source, then client 204 will be sent to target video resource.Client 204, which shows the target video resource received, is shielding
On curtain.
Optionally, in the present embodiment, the determining device of above-mentioned video clip can be, but not limited to be applied to video clipping
Scene in.Wherein, above-mentioned client can be, but not limited to as various types of applications, for example, online education application, Instant Messenger
Interrogate application, community space application, game application, shopping application, browser application, financial application, multimedia application, live streaming application
Deng.Specifically, can be, but not limited to be applied in above-mentioned game application carry out video clipping scene in, or can with but not
It is limited to be applied to carry out in above-mentioned multimedia application in the scene of video clipping, meets item to improve the determination from video resource
The efficiency of the video clip of part.Above-mentioned is only a kind of example, does not do any restriction to this in the present embodiment.
Optionally, in the present embodiment, the determining device of above-mentioned video clip can be, but not limited to be executed by server, or
Person can also be by client executing, or can also be executed by server and client side's interaction.
Optionally, in the present embodiment, video resource can be, but not limited to include video website, the view in video player
Frequency resource, such as: movie and television play, animation, variety video file.It can also include game live streaming, sports show, live streaming application etc.
Deng live video stream, such as: sports event live broadcast, competition game live streaming, TV programme live streaming etc..Alternatively, can also wrap
Include video, such as game video etc. obtained in the use process of client.
Optionally, in the present embodiment, video clip can be, but not limited to meet certain item in above-mentioned video resource
The segment of part.Such as: movie and television play, animation, the performance segment that certain performer or singer are determined in variety video file.It can be with
Including the Highlight etc. in live video stream, such as: the video clip for Highlight of competing in sports event live broadcast, or
Goal segment collection of choice specimens of certain soccer star etc..It either can also include wonderful or the useful content in client use process
Segment etc., such as: kill segment, teaching segment of office software use process etc. in game.
Optionally, in the present embodiment, fisrt feature is used to indicate the image of object included in each video clip
Feature.Such as: characteristics of image can be, but not limited to include the spatial relationship between color characteristic, textural characteristics, shape feature, object
Feature etc..
Optionally, in the present embodiment, second feature is used to indicate object included in each video clip
Motion feature.Such as: motion feature can be, but not limited to include Optical-flow Feature, feature of motion amplitude for indicating object etc.
Deng.
Optionally, in the present embodiment, target image characteristics and Target Motion Character can be, but not limited to be according to mesh
The condition that the target video segment that the demand of mark video clip determines needs to meet.Target image characteristics and Target Motion Character can
With but be not limited by configuration and obtain, or the analysis of historical data can also be automatically generated by smart network.Than
Such as: by taking basketball soccer star's goal collection of choice specimens as an example, the history goal segment of the soccer star can be input in intelligent algorithm, by calculating
Method, which automatically identifies target image characteristics, may include the facial characteristics of the soccer star, the feature of basketry, feature of basketball etc., mesh
The movement range that mark motion feature may include the soccer star enters basketry etc. more than target value, basketball.
Optionally, in the present embodiment, it can determine that the video clip that fisrt feature is target image characteristics is with mesh
The video clip of logo image feature, second feature are that the video clip of Target Motion Character is the video with Target Motion Character
Segment.Or it can also determine that the similarity of fisrt feature and target image characteristics reaches the video clip of certain value for target
The video clip that the similarity of the video clip of characteristics of image, second feature and Target Motion Character reaches certain value is with target
The video clip of characteristics of image.It should be noted that determining whether object included in video clip has target image special
Target Motion Character of seeking peace is without being limited thereto.
As it can be seen that passing through be used to indicate characteristics of image first to the video clip in video resource by above-mentioned apparatus
Feature and be used to indicate motion feature second feature extraction, according to fisrt feature and second feature from multiple video clips
In determine the video clip for meeting target image characteristics and Target Motion Character, to realize the automatic acquisition of video clip
While fully considered feature of the target video segment on image dimension and movement dimension, make it possible to more accurately really
Surely meet the target video segment of condition, to realize the effect for improving the video clip for determining the condition that meets from video resource
The technical effect of rate, and then solve and determine that the efficiency for the video clip for meeting condition is lower from video resource in the related technology
The technical issues of.
Optionally, in the present embodiment, determining module includes: the first determination unit, for according to fisrt feature and second
Feature determines the video classification of each video clip in multiple video clips;Second determination unit is used for multiple video clips
Middle video classification is that the video clip of target category is determined as target video segment.
Optionally, in the present embodiment, the first determination unit includes: the first input subelement, for fisrt feature is defeated
Enter image classification model, obtain the corresponding first category parameter of each video clip, wherein image classification model is using mark
The model for having the fisrt feature sample of image category to be trained the first disaggregated model, first category parameter are used to indicate
Image category belonging to each video clip;Second input subelement, for obtaining second feature input motion disaggregated model
The corresponding second category parameter of each video clip, wherein classification of motions model is special using be labeled with sports category second
The model that sign sample is trained the second disaggregated model, second category parameter are used to indicate belonging to each video clip
Sports category;First determines subelement, for by the corresponding first category parameter of each video clip and second category parameter
Weighted sum is determined as the video classification parameter of each video clip, wherein video classification parameter is used to indicate each video clip
Affiliated video classification.
Optionally, in the present embodiment, the first determination unit includes: fusion subelement, for each video clip
Fisrt feature and second feature carry out Fusion Features, obtain the space-time characteristic of each video clip;Third inputs subelement, is used for
The space-time characteristic of each video clip is inputted into space-time disaggregated model, obtains the corresponding video classification parameter of each video clip,
Wherein, space-time disaggregated model is labeled with the other space-time characteristic sample of video class for use and is trained to obtain to third disaggregated model
Model, video classification parameter is used to indicate video classification belonging to each video clip.
Optionally, in the present embodiment, the second determination unit includes: the first acquisition subelement, is used for from multiple piece of video
The video clip that video classification parameter falls into targets threshold range is obtained in section;Second determines subelement, is used for video classification
The video clip that parameter falls into targets threshold range is determined as target video segment.
Optionally, in the present embodiment, the second acquisition module includes: the first input unit, is used for each video clip
It inputs fisrt feature and extracts model, obtain the fisrt feature of each video clip, wherein it is to use that fisrt feature, which extracts model,
The model that one feature samples are trained initial fisrt feature model;Second input unit is used for each piece of video
Section input light stream Feature Selection Model, obtains the Optical-flow Feature of each video clip, wherein it is to use that Optical-flow Feature, which extracts model,
The model that Optical-flow Feature sample is trained initial light stream characteristic model, second feature include Optical-flow Feature.
Optionally, in the present embodiment, above-mentioned apparatus further include: third obtains module, for obtaining initial pictures convolution
The average value of the corresponding first convolution layer parameter of the RGB triple channel of model, and average value is determined as initial light stream character modules
The initiation parameter of first convolutional layer of type;Module is adjusted, the numerical value model for composing the light stream of initial light stream feature samples
The numberical range for being adjusted to the input parameter of initial pictures convolution model is enclosed, Optical-flow Feature sample is obtained;Training module, for making
Initial light stream characteristic model is trained with light stream feature samples, Optical-flow Feature is obtained and extracts model.
Optionally, in the present embodiment, multiple video clips are the video clip in video resource including target object,
In, the first acquisition module includes: first acquisition unit, for obtaining the corresponding screen template of target object, wherein screen template
It is used to indicate the attribute information of target object;Second acquisition unit, for being obtained and picture mould from the video frame of video resource
The similarity of plate is higher than the target video frame of target similarity;Third acquiring unit, it is multiple for being obtained from target video frame
Video clip, wherein include one or continuous multiple target video frames in each video clip in multiple video clips.
Optionally, in the present embodiment, second acquisition unit includes: the second acquisition subelement, for from video resource
The first video frame is obtained in video frame, wherein the first attribute of the first video frame and the first attributes match of target object;Third
Subelement is obtained, for obtaining the second video frame for being higher than target similarity with the similarity of screen template from the first video frame
As target video frame.
Optionally, in the present embodiment, second acquisition unit includes: division subelement, for by the video of video resource
Each video frame in frame is divided into foreground picture and background frame;Third determines subelement, for determining each video respectively
First distance between the foreground picture and foreground template of frame and between the background frame and background template of each video frame
Second distance, wherein screen template includes foreground template and background template;4th determines subelement, for determining each video
The first distance of frame and the weighted sum of second distance, wherein corresponding first weight of first distance, corresponding second power of second distance
Weight, the first weight are greater than the second weight;5th determines subelement, corresponding for video frame each in the video frame by video resource
Weighted sum be determined as target video frame lower than the video frame of target value.
Optionally, in the present embodiment, target video segment is multiple target video segments, wherein above-mentioned apparatus is also wrapped
It includes: splicing module, for multiple target video segments to be spliced into target video resource sequentially in time;Sending module is used
It is used to play the client of target video resource in target video resource to be sent to.
The application environment of the embodiment of the present invention can be, but not limited to referring to the application environment in above-described embodiment, the present embodiment
In this is repeated no more.The embodiment of the invention provides the optional tools of one kind of the connection method for implementing above-mentioned real time communication
Body application example.
As a kind of optional embodiment, the determination method of above-mentioned video clip can be, but not limited to be applied to such as Figure 11 institute
In the scene of the wonderful editing for the gunbattle class game shown.When automating editing gunbattle class game video wonderful, AI
Possible wonderful can be generated according to detection is killed, then, to each possible wonderful, TSN network can basis
Characteristics of image and Optical-flow Feature obtain the excellent marking of each segment in conjunction with the label data of offline sample.Final realize uses
The method of machine learning automates editing gunbattle class game video wonderful.
In gunbattle class game video, the most of the time is in the exploration for carrying out map, and wonderful therein is very
Few a part.There is the game video of many gunbattle class game by taking the existential mode of gunbattle class game as an example, on network, wherein
The wonderful collection of choice specimens attracted the attention of large quantities of online friends.These wonderfuls of artificial selection can take considerable time and energy.
In this scene, above-mentioned excellent degree evaluation method is proposed in order to solve the automatic editing of wonderful.It, will by the method
According to image and Optical-flow Feature, the excellent degree marking of each segment is exported, to realize the automatic of gunbattle class game video segment
Editing.
The above-mentioned implementation method of gunbattle class game video wonderful editing can allow game video editing to realize automation,
And the wonderful exported meets human cognitive, improves the efficiency of game segment editing.
Another aspect according to an embodiment of the present invention additionally provides a kind of for implementing the determination of above-mentioned video clip
Electronic device, as shown in figure 12, the electronic device include: one or more (one is only shown in figure) processors 1202, storage
Device 1204, sensor 1206, encoder 1208 and transmitting device 1210 are stored with computer program in the memory, at this
Reason device is arranged to execute the step in any of the above-described embodiment of the method by computer program.
Optionally, in the present embodiment, above-mentioned electronic device can be located in multiple network equipments of computer network
At least one network equipment.
Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:
S1 obtains multiple video clips from video resource;
S2 obtains the fisrt feature and second feature of each video clip in multiple video clips, wherein fisrt feature is used
In the characteristics of image for indicating object included in each video clip, second feature is used to indicate in each video clip and is wrapped
The motion feature of the object included;
S3 determines target video segment, wherein mesh according to fisrt feature and second feature in multiple video clips
Marking video clip is the piece of video that object included in multiple video clips has target image characteristics and Target Motion Character
Section.
Optionally, it will appreciated by the skilled person that structure shown in Figure 12 is only to illustrate, electronic device can also
To be smart phone (such as Android phone, iOS mobile phone), tablet computer, palm PC and mobile internet device
The terminal devices such as (Mobile Internet Devices, MID), PAD.Figure 12 it does not make to the structure of above-mentioned electronic device
At restriction.For example, electronic device may also include more or less component (such as network interface, display dress than shown in Figure 12
Set), or with the configuration different from shown in Figure 12.
Wherein, memory 1202 can be used for storing software program and module, such as the video clip in the embodiment of the present invention
The corresponding program instruction/module of determination method and apparatus, processor 1204 by operation be stored in it is soft in memory 1202
Part program and module realize the controlling party of above-mentioned target element thereby executing various function application and data processing
Method.Memory 1202 may include high speed random access memory, can also include nonvolatile memory, such as one or more magnetism
Storage device, flash memory or other non-volatile solid state memories.In some instances, memory 1202 can further comprise
The memory remotely located relative to processor 1204, these remote memories can pass through network connection to terminal.Above-mentioned net
The example of network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Above-mentioned transmitting device 1210 is used to that data to be received or sent via a network.Above-mentioned network specific example
It may include cable network and wireless network.In an example, transmitting device 1210 includes a network adapter (Network
Interface Controller, NIC), can be connected by cable with other network equipments with router so as to interconnection
Net or local area network are communicated.In an example, transmitting device 1210 is radio frequency (Radio Frequency, RF) module,
For wirelessly being communicated with internet.
Wherein, specifically, memory 1202 is for storing application program.
The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein
The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps
Calculation machine program:
S1 obtains multiple video clips from video resource;
S2 obtains the fisrt feature and second feature of each video clip in multiple video clips, wherein fisrt feature is used
In the characteristics of image for indicating object included in each video clip, second feature is used to indicate in each video clip and is wrapped
The motion feature of the object included;
S3 determines target video segment, wherein mesh according to fisrt feature and second feature in multiple video clips
Marking video clip is the piece of video that object included in multiple video clips has target image characteristics and Target Motion Character
Section.
Optionally, storage medium is also configured to store for executing step included in the method in above-described embodiment
Computer program, this is repeated no more in the present embodiment.
Optionally, in the present embodiment, those of ordinary skill in the art will appreciate that in the various methods of above-described embodiment
All or part of the steps be that the relevant hardware of terminal device can be instructed to complete by program, the program can store in
In one computer readable storage medium, storage medium may include: flash disk, read-only memory (Read-Only Memory,
ROM), random access device (Random Access Memory, RAM), disk or CD etc..
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
If the integrated unit in above-described embodiment is realized in the form of SFU software functional unit and as independent product
When selling or using, it can store in above-mentioned computer-readable storage medium.Based on this understanding, skill of the invention
Substantially all or part of the part that contributes to existing technology or the technical solution can be with soft in other words for art scheme
The form of part product embodies, which is stored in a storage medium, including some instructions are used so that one
Platform or multiple stage computers equipment (can be personal computer, server or network equipment etc.) execute each embodiment institute of the present invention
State all or part of the steps of method.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment
The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed client, it can be by others side
Formula is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, and only one
Kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or
It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it
Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module
It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (15)
1. a kind of determination method of video clip characterized by comprising
Multiple video clips are obtained from video resource;
Obtain the fisrt feature and second feature of each video clip in the multiple video clip, wherein the fisrt feature
It is used to indicate the characteristics of image of object included in each video clip, the second feature is used to indicate described each
The motion feature of included object in video clip;
According to the fisrt feature and the second feature, target video segment is determined in the multiple video clip,
In, the target video segment is that there are object included in the multiple video clip target image characteristics and target to move
The video clip of feature.
2. the method according to claim 1, wherein according to the fisrt feature and the second feature, in institute
It states and determines that target video segment includes: in multiple video clips
The video class of each video clip in the multiple video clip is determined according to the fisrt feature and the second feature
Not;
The video clip that video classification described in the multiple video clip is target category is determined as the target video piece
Section.
3. according to the method described in claim 2, it is characterized in that, determining institute according to the fisrt feature and the second feature
The video classification for stating each video clip in multiple video clips includes:
By the fisrt feature input picture disaggregated model, the corresponding first category parameter of each video clip is obtained,
In, described image disaggregated model is to be trained using the fisrt feature sample for being labeled with image category to the first disaggregated model
The model arrived, the first category parameter are used to indicate image category belonging to each video clip;
By the second feature input motion disaggregated model, the corresponding second category parameter of each video clip is obtained,
In, the classification of motions model is to be trained using the second feature sample for being labeled with sports category to the second disaggregated model
The model arrived, the second category parameter are used to indicate sports category belonging to each video clip;
The weighted sum of each corresponding first category parameter of video clip and the second category parameter is determined as
The video classification parameter of each video clip, wherein the video classification parameter is used to indicate each video clip
Affiliated video classification.
4. according to the method described in claim 2, it is characterized in that, determining institute according to the fisrt feature and the second feature
The video classification for stating each video clip in multiple video clips includes:
The fisrt feature and the second feature to each video clip carry out Fusion Features, obtain each view
The space-time characteristic of frequency segment;
The space-time characteristic of each video clip is inputted into space-time disaggregated model, it is corresponding to obtain each video clip
Video classification parameter, wherein the space-time disaggregated model be use be labeled with the other space-time characteristic sample of video class to third
The model that disaggregated model is trained, the video classification parameter are used to indicate video belonging to each video clip
Classification.
5. the method according to claim 3 or 4, which is characterized in that by video classification described in the multiple video clip
It is determined as the target video segment for the video clip of target category and includes:
The video clip that the video classification parameter falls into targets threshold range is obtained from the multiple video clip;
The video clip that the video classification parameter falls into targets threshold range is determined as the target video segment.
6. the method according to claim 1, wherein obtaining each video clip in the multiple video clip
Fisrt feature and second feature include:
Each video clip input fisrt feature is extracted into model, obtain each video clip described first is special
Sign, wherein it is to be trained to obtain to initial fisrt feature model using fisrt feature sample that the fisrt feature, which extracts model,
Model;
Each video clip is inputted into light stream Feature Selection Model, obtains the Optical-flow Feature of each video clip,
In, it is the mould being trained using Optical-flow Feature sample to initial light stream characteristic model that the Optical-flow Feature, which extracts model,
Type, the second feature include the Optical-flow Feature.
7. according to the method described in claim 6, it is characterized in that, each video clip is inputted light stream feature extraction
Model, before obtaining the Optical-flow Feature of each video clip, the method also includes:
The average value of the corresponding first convolution layer parameter of RGB triple channel of initial pictures convolution model is obtained, and will be described flat
Mean value is determined as the initiation parameter of first convolutional layer of the initial light stream characteristic model;
The numberical range that the light stream of initial light stream feature samples is composed is adjusted to the input parameter of the initial pictures convolution model
Numberical range, obtain the Optical-flow Feature sample;
The initial light stream characteristic model is trained using the Optical-flow Feature sample, the Optical-flow Feature is obtained and extracts mould
Type.
8. the method according to claim 1, wherein the multiple video clip be the video resource in include
The video clip of target object, wherein the multiple video clip is obtained from the video resource includes:
Obtain the corresponding screen template of the target object, wherein the screen template is used to indicate the category of the target object
Property information;
The target for being higher than target similarity with the similarity of the screen template is obtained from the video frame of the video resource to regard
Frequency frame;
The multiple video clip is obtained from the target video frame, wherein each video in the multiple video clip
It include one or continuous multiple target video frames in segment.
9. according to the method described in claim 8, it is characterized in that, being obtained and the picture from the video frame of the video resource
The similarity of face die plate be higher than target similarity target video frame include:
The first video frame is obtained from the video frame of the video resource, wherein the first attribute of first video frame and institute
State the first attributes match of target object;
The second video for being higher than the target similarity with the similarity of the screen template is obtained from first video frame
Frame is as the target video frame.
10. according to the method described in claim 8, it is characterized in that, from the video frame of the video resource obtain with it is described
The similarity of screen template be higher than target similarity target video frame include:
Each video frame in the video frame of the video resource is divided into foreground picture and background frame;
First distance between the foreground picture and foreground template of each video frame and described each is determined respectively
Second distance between the background frame and background template of video frame, wherein the screen template includes the prospect mould
Plate and the background template;
Determine the first distance of each video frame and the weighted sum of the second distance, wherein the first distance
Corresponding first weight, corresponding second weight of the second distance, first weight are greater than second weight;
Video frame by the corresponding weighted sum of video frame each in the video frame of the video resource lower than target value determines
For the target video frame.
11. the method according to claim 1, wherein the target video segment be multiple target video segments,
Wherein, according to the fisrt feature and the second feature, target video segment is determined in the multiple video clip
Later, the method also includes:
The multiple target video segment is spliced into target video resource sequentially in time;
The target video resource is sent to the client for being used to play the target video resource.
12. a kind of determining device of video clip characterized by comprising
First obtains module, for obtaining multiple video clips from video resource;
Second obtains module, for obtaining the fisrt feature and second feature of each video clip in the multiple video clip,
Wherein, the fisrt feature is used to indicate the characteristics of image of object included in each video clip, and described second is special
Sign is used to indicate the motion feature of object included in each video clip;
Determining module, for determining mesh in the multiple video clip according to the fisrt feature and the second feature
Mark video clip, wherein the target video segment is that object included in the multiple video clip has target image
The video clip of feature and Target Motion Character.
13. device according to claim 12, which is characterized in that determining module includes:
First determination unit, it is each in the multiple video clip for being determined according to the fisrt feature and the second feature
The video classification of video clip;
Second determination unit, for determining the video clip that video classification described in the multiple video clip is target category
For the target video segment.
14. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer
Program is arranged to execute method described in any one of claim 1 to 11 when operation.
15. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory
Sequence, the processor are arranged to execute side described in any one of claim 1 to 11 by the computer program
Method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811427035.8A CN110166827B (en) | 2018-11-27 | 2018-11-27 | Video clip determination method and device, storage medium and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811427035.8A CN110166827B (en) | 2018-11-27 | 2018-11-27 | Video clip determination method and device, storage medium and electronic device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110166827A true CN110166827A (en) | 2019-08-23 |
CN110166827B CN110166827B (en) | 2022-09-13 |
Family
ID=67645229
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811427035.8A Active CN110166827B (en) | 2018-11-27 | 2018-11-27 | Video clip determination method and device, storage medium and electronic device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110166827B (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516749A (en) * | 2019-08-29 | 2019-11-29 | 网易传媒科技(北京)有限公司 | Model training method, method for processing video frequency, device, medium and calculating equipment |
CN110677722A (en) * | 2019-09-29 | 2020-01-10 | 上海依图网络科技有限公司 | Video processing method, and apparatus, medium, and system thereof |
CN110855904A (en) * | 2019-11-26 | 2020-02-28 | Oppo广东移动通信有限公司 | Video processing method, electronic device and storage medium |
CN110856013A (en) * | 2019-11-19 | 2020-02-28 | 珠海格力电器股份有限公司 | Method, system and storage medium for identifying key segments in video |
CN110856042A (en) * | 2019-11-18 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Video playing method and device, computer readable storage medium and computer equipment |
CN111541938A (en) * | 2020-04-30 | 2020-08-14 | 维沃移动通信有限公司 | Video generation method and device and electronic equipment |
CN112052357A (en) * | 2020-04-15 | 2020-12-08 | 上海摩象网络科技有限公司 | Video clip marking method and device and handheld camera |
CN112579824A (en) * | 2020-12-16 | 2021-03-30 | 北京中科闻歌科技股份有限公司 | Video data classification method and device, electronic equipment and storage medium |
CN112770061A (en) * | 2020-12-16 | 2021-05-07 | 影石创新科技股份有限公司 | Video editing method, system, electronic device and storage medium |
CN112770167A (en) * | 2020-12-21 | 2021-05-07 | 深圳Tcl新技术有限公司 | Video display method and device, intelligent display terminal and storage medium |
CN112804578A (en) * | 2021-01-28 | 2021-05-14 | 广州虎牙科技有限公司 | Atmosphere special effect generation method and device, electronic equipment and storage medium |
CN113014831A (en) * | 2021-03-05 | 2021-06-22 | 上海明略人工智能(集团)有限公司 | Method, device and equipment for acquiring scenes of sports video |
CN113079420A (en) * | 2020-01-03 | 2021-07-06 | 北京三星通信技术研究有限公司 | Video generation method and device, electronic equipment and computer readable storage medium |
CN113286194A (en) * | 2020-02-20 | 2021-08-20 | 北京三星通信技术研究有限公司 | Video processing method and device, electronic equipment and readable storage medium |
CN113395542A (en) * | 2020-10-26 | 2021-09-14 | 腾讯科技(深圳)有限公司 | Video generation method and device based on artificial intelligence, computer equipment and medium |
CN113542894A (en) * | 2020-11-25 | 2021-10-22 | 腾讯科技(深圳)有限公司 | Game video editing method, device, equipment and storage medium |
CN113747162A (en) * | 2020-05-29 | 2021-12-03 | 北京金山云网络技术有限公司 | Video processing method and apparatus, storage medium, and electronic apparatus |
CN115103223A (en) * | 2022-06-02 | 2022-09-23 | 咪咕视讯科技有限公司 | Video content detection method, device, equipment and storage medium |
CN115695904A (en) * | 2021-07-21 | 2023-02-03 | 广州视源电子科技股份有限公司 | Video processing method and device, computer storage medium and intelligent interactive panel |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110182469A1 (en) * | 2010-01-28 | 2011-07-28 | Nec Laboratories America, Inc. | 3d convolutional neural networks for automatic human action recognition |
CN105550699A (en) * | 2015-12-08 | 2016-05-04 | 北京工业大学 | CNN-based video identification and classification method through time-space significant information fusion |
CN108388876A (en) * | 2018-03-13 | 2018-08-10 | 腾讯科技(深圳)有限公司 | A kind of image-recognizing method, device and relevant device |
CN108763325A (en) * | 2018-05-04 | 2018-11-06 | 北京达佳互联信息技术有限公司 | A kind of network object processing method and processing device |
-
2018
- 2018-11-27 CN CN201811427035.8A patent/CN110166827B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110182469A1 (en) * | 2010-01-28 | 2011-07-28 | Nec Laboratories America, Inc. | 3d convolutional neural networks for automatic human action recognition |
CN105550699A (en) * | 2015-12-08 | 2016-05-04 | 北京工业大学 | CNN-based video identification and classification method through time-space significant information fusion |
CN108388876A (en) * | 2018-03-13 | 2018-08-10 | 腾讯科技(深圳)有限公司 | A kind of image-recognizing method, device and relevant device |
CN108763325A (en) * | 2018-05-04 | 2018-11-06 | 北京达佳互联信息技术有限公司 | A kind of network object processing method and processing device |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516749A (en) * | 2019-08-29 | 2019-11-29 | 网易传媒科技(北京)有限公司 | Model training method, method for processing video frequency, device, medium and calculating equipment |
CN110677722A (en) * | 2019-09-29 | 2020-01-10 | 上海依图网络科技有限公司 | Video processing method, and apparatus, medium, and system thereof |
CN110856042A (en) * | 2019-11-18 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Video playing method and device, computer readable storage medium and computer equipment |
CN110856013A (en) * | 2019-11-19 | 2020-02-28 | 珠海格力电器股份有限公司 | Method, system and storage medium for identifying key segments in video |
CN110855904A (en) * | 2019-11-26 | 2020-02-28 | Oppo广东移动通信有限公司 | Video processing method, electronic device and storage medium |
CN113079420A (en) * | 2020-01-03 | 2021-07-06 | 北京三星通信技术研究有限公司 | Video generation method and device, electronic equipment and computer readable storage medium |
CN113286194A (en) * | 2020-02-20 | 2021-08-20 | 北京三星通信技术研究有限公司 | Video processing method and device, electronic equipment and readable storage medium |
CN112052357A (en) * | 2020-04-15 | 2020-12-08 | 上海摩象网络科技有限公司 | Video clip marking method and device and handheld camera |
CN112052357B (en) * | 2020-04-15 | 2022-04-01 | 上海摩象网络科技有限公司 | Video clip marking method and device and handheld camera |
CN111541938A (en) * | 2020-04-30 | 2020-08-14 | 维沃移动通信有限公司 | Video generation method and device and electronic equipment |
CN113747162A (en) * | 2020-05-29 | 2021-12-03 | 北京金山云网络技术有限公司 | Video processing method and apparatus, storage medium, and electronic apparatus |
CN113747162B (en) * | 2020-05-29 | 2023-09-29 | 北京金山云网络技术有限公司 | Video processing method and device, storage medium and electronic device |
CN113395542B (en) * | 2020-10-26 | 2022-11-08 | 腾讯科技(深圳)有限公司 | Video generation method and device based on artificial intelligence, computer equipment and medium |
CN113395542A (en) * | 2020-10-26 | 2021-09-14 | 腾讯科技(深圳)有限公司 | Video generation method and device based on artificial intelligence, computer equipment and medium |
CN113542894B (en) * | 2020-11-25 | 2022-08-19 | 腾讯科技(深圳)有限公司 | Game video editing method, device, equipment and storage medium |
CN113542894A (en) * | 2020-11-25 | 2021-10-22 | 腾讯科技(深圳)有限公司 | Game video editing method, device, equipment and storage medium |
CN112579824A (en) * | 2020-12-16 | 2021-03-30 | 北京中科闻歌科技股份有限公司 | Video data classification method and device, electronic equipment and storage medium |
CN112770061A (en) * | 2020-12-16 | 2021-05-07 | 影石创新科技股份有限公司 | Video editing method, system, electronic device and storage medium |
WO2022127877A1 (en) * | 2020-12-16 | 2022-06-23 | 影石创新科技股份有限公司 | Video editing method and system, electronic device, and storage medium |
CN112770167A (en) * | 2020-12-21 | 2021-05-07 | 深圳Tcl新技术有限公司 | Video display method and device, intelligent display terminal and storage medium |
CN112804578A (en) * | 2021-01-28 | 2021-05-14 | 广州虎牙科技有限公司 | Atmosphere special effect generation method and device, electronic equipment and storage medium |
CN113014831A (en) * | 2021-03-05 | 2021-06-22 | 上海明略人工智能(集团)有限公司 | Method, device and equipment for acquiring scenes of sports video |
CN113014831B (en) * | 2021-03-05 | 2024-03-12 | 上海明略人工智能(集团)有限公司 | Method, device and equipment for scene acquisition of sports video |
CN115695904A (en) * | 2021-07-21 | 2023-02-03 | 广州视源电子科技股份有限公司 | Video processing method and device, computer storage medium and intelligent interactive panel |
CN115103223A (en) * | 2022-06-02 | 2022-09-23 | 咪咕视讯科技有限公司 | Video content detection method, device, equipment and storage medium |
CN115103223B (en) * | 2022-06-02 | 2023-11-10 | 咪咕视讯科技有限公司 | Video content detection method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110166827B (en) | 2022-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110166827A (en) | Determination method, apparatus, storage medium and the electronic device of video clip | |
US20220239988A1 (en) | Display method and apparatus for item information, device, and computer-readable storage medium | |
CN109145784B (en) | Method and apparatus for processing video | |
Peng et al. | Two-stream collaborative learning with spatial-temporal attention for video classification | |
CN109145840B (en) | Video scene classification method, device, equipment and storage medium | |
CN108140032B (en) | Apparatus and method for automatic video summarization | |
Shao et al. | Deeply learned attributes for crowded scene understanding | |
CN110147711A (en) | Video scene recognition methods, device, storage medium and electronic device | |
Sharma et al. | Action recognition using visual attention | |
CN109154976A (en) | Pass through the system and method for machine learning training object classifier | |
CN110532996A (en) | The method of visual classification, the method for information processing and server | |
CN103365936A (en) | Video recommendation system and method thereof | |
CN109614517A (en) | Classification method, device, equipment and the storage medium of video | |
CN110516671A (en) | Training method, image detecting method and the device of neural network model | |
CN111026914A (en) | Training method of video abstract model, video abstract generation method and device | |
TW201907736A (en) | Method and device for generating video summary | |
US11504635B2 (en) | Vector-space framework for evaluating gameplay content in a game environment | |
Liu et al. | Soccer video event detection using 3D convolutional networks and shot boundary detection via deep feature distance | |
CN111491187A (en) | Video recommendation method, device, equipment and storage medium | |
CN110851621A (en) | Method, device and storage medium for predicting video wonderful level based on knowledge graph | |
CN107801061A (en) | Ad data matching process, apparatus and system | |
CN108959323A (en) | Video classification methods and device | |
CN111432206A (en) | Video definition processing method and device based on artificial intelligence and electronic equipment | |
Yuan et al. | Contextualized spatio-temporal contrastive learning with self-supervision | |
US10397658B1 (en) | Identifying relevant gameplay content for a game environment using a vector-space framework |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |