CN108900896A

CN108900896A - Video clipping method and device

Info

Publication number: CN108900896A
Application number: CN201810533496.7A
Authority: CN
Inventors: 聂洪浩
Original assignee: Jiangxi Machen Communications Co Ltd; Shenzhen Tinno Mobile Technology Co Ltd; Shenzhen Tinno Wireless Technology Co Ltd
Current assignee: Jiangxi Machen Communications Co Ltd; Shenzhen Tinno Mobile Technology Co Ltd; Shenzhen Tinno Wireless Technology Co Ltd
Priority date: 2018-05-29
Filing date: 2018-05-29
Publication date: 2018-11-27

Abstract

This application discloses a kind of video clipping method and device, this method includes：Obtain video data to be clipped；The target clip object key frame of requirement is determined for compliance with from the video data to be clipped by established target clip object model, wherein, the target clip object model includes multiple and different target clip object associative classification model, each target clip object associative classification model for identified from the video data to be clipped with the associated corresponding key feature of target clip object so that the target clip object model determines the target clip object key frame according to multiple and different key features from the video data to be clipped；The multiple single-frame images in front and back for saving the target clip object key frame, obtain target editing video data.By the above-mentioned means, the application can provide technical support to improve the accuracy of clip object, technical support is provided for the rich of clip object.

Description

Video clipping method and device

Technical field

This application involves technical field of video processing, more particularly to a kind of video clipping method, video clipping device and Device with store function.

Background technique

Current competitive sports program, it is often necessary to splendid contents are subjected to editing, this generally requires manually to see complete field Match, carries out editing manually, can waste many manpowers and time.

In the way of machine learning, computer is allowed to learn automatically, by video content analysis, finds out excellent portion therein Point, excellent video content is intercepted automatically, synthesis meets the excellent collection of choice specimens of competitive sports of human demand.Such as：By language Sound data are converted into text data, are trained using the text data after image data and conversion, establish editing model；Again Such as：Ball and sportsman are mapped to the positioning of graphic picture coordinate system, received after image switching instructs with ball or sportsman in graphic picture Centered on positioning on coordinate system, magnified picture is intercepted；Alternatively, establishing the classification mould after multiple training by image Type carries out the classification based on object content to image.

But present inventor has found in long-term R&D process, the mode clip object of above-mentioned editing video Inaccuracy, clip object are relatively simple.

Summary of the invention

The application can be the standard that improves clip object mainly solving the technical problems that provide a kind of video clipping method True property provides technical support, provides technical support for the rich of clip object.

In order to solve the above technical problems, the technical solution that the application uses is：A kind of video clipping method, institute are provided The method of stating includes：Obtain video data to be clipped；By established target clip object model from the video counts to be clipped The target clip object key frame of requirement is determined for compliance in, wherein the target clip object model includes multiple and different Target clip object associative classification model, each target clip object associative classification model are used for from the video to be clipped Identified in data with the associated corresponding key feature of target clip object so that the target clip object model is according to more A different key feature determines the target clip object key frame from the video data to be clipped；Described in preservation The multiple single-frame images in front and back of target clip object key frame, obtain target editing video data.

In order to solve the above technical problems, another technical solution that the application uses is：A kind of video clipping device is provided, Described device includes：Processor, memory and telecommunication circuit, the processor are respectively coupled to the memory and the communication Circuit, the processor, the memory and the telecommunication circuit can be realized the step in method as described above at work Suddenly.

In order to solve the above technical problems, another technical solution that the application uses is：It provides a kind of with store function Device, be stored thereon with program data, described program data realize the step in method as described above when being executed by processor.

The beneficial effect of the application is：It is in contrast to the prior art, the application obtains video data to be clipped；Pass through The target clip object that established target clip object model is determined for compliance with requirement from the video data to be clipped is crucial Frame, wherein the target clip object model includes multiple and different target clip object associative classification models, each mesh Mark clip object associative classification model is associated right with target clip object for identifying from the video data to be clipped The key feature answered, so that the target clip object model is according to multiple and different key features from the view to be clipped Frequency determines the target clip object key frame in；Save the multiple single frames figures in front and back of the target clip object key frame Picture obtains target editing video data.Since multiple target clip object associative classification models can be from video data to be clipped In identify with the associated multiple corresponding key features of target clip object, target clip object model is according to multiple and different When the key feature determines target clip object key frame from video data to be clipped, can there are two types of possible mode, One is including multiple corresponding key feature in single-frame images multiple before and after target clip object key frame, this can enable Clip object reduces, targeted strong, the accuracy of editing is improved, by target clip object model according to multiple and different The key feature determines target clip object key frame from video data to be clipped, provides technology to improve editing accuracy It supports；As long as another kind is that there are any one of multiple key features in video data to be clipped, it is determined that corresponding mesh Clip object key frame is marked, this can enable the expansions of the range of editing, and clip object is abundant, pass through target clip object model root Target clip object key frame is determined from video data to be clipped according to multiple and different key features, to improve editing pair The rich offer technical support of elephant.

Detailed description of the invention

In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.Wherein：

Fig. 1 is the flow diagram of one embodiment of the application video clipping method；

Fig. 2 is the flow diagram of another embodiment of the application video clipping method；

Fig. 3 is the flow diagram of the another embodiment of the application video clipping method；

Fig. 4 is that football flies into net sample schematic diagram in one specific embodiment of the application video clipping method；

Fig. 5 is that non-football flies into net sample schematic diagram in one specific embodiment of the application video clipping method；

Fig. 6 is the structural schematic diagram of one embodiment of the application video clipping device；

Fig. 7 is the structural schematic diagram for one embodiment of device that the application has store function.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, rather than whole embodiments.Based on this Embodiment in application, those of ordinary skill in the art are obtained every other under the premise of not making creative labor Embodiment shall fall in the protection scope of this application.

Refering to fig. 1, Fig. 1 is the flow diagram of one embodiment of the application video clipping method, and this method includes：

Step S101：Obtain video data to be clipped.

Step S102：Requirement is determined for compliance with from video data to be clipped by established target clip object model Target clip object key frame, wherein target clip object model includes multiple and different target clip object associative classification mould Type, each target clip object associative classification model are associated with for identifying from video data to be clipped with target clip object Corresponding key feature so that target clip object model is according to multiple and different key features from video data to be clipped Determine target clip object key frame.

Step S103：The multiple single-frame images in front and back for saving target clip object key frame, obtain target editing video counts According to.

Video data to be clipped refers to that user wishes the video data of editing, such as：Football video, basketball video, game Video, the match video played soccer together using one section of junior partner that mobile phone is shot, football fan or reporter are after having watched ball match The video, etc. obtained using mobile phone video recording.

In the present embodiment, target clip object refers to the object that user wishes that editing comes out, for example, a certain specific right As perhaps all objects associated with a certain special object perhaps the object at certain soul-stirring moment or certain have The object, etc. of special meaning scene, these are specific require：Satisfactory target clip object.With football race For video：Satisfactory target clip object can be this special object of scoring；Alternatively, being also possible to related to goal A series of objects of connection are scored such as pass, the shooting before scoring, and football fan's expression excitement sound is loud and sonorous after goal, announcer explains It has scored, sportsman and coach make celebration movement, score immediate updating, etc.；Alternatively, not scoring, also there is football fan's expression The loud and sonorous breathtaking scene of excited sound, sportsman and coach make oiling movement, etc..Satisfactory target clip object Key frame, as user wish key frame that editing comes out, satisfactory.

With the associated key feature of target clip object, refers to the object that foundation user wishes that editing comes out and determine It is having on image, can judge to wish the associated feature of object that editing comes out with user with maximum probability.For example, when scoring Carve existing key feature：There is football in picture, and football flies into net；It usually will appear these key features after goal：Football fan Expression excitement sound is loud and sonorous, announcer's explanation has been scored, sportsman and coach make celebration movement, score immediate updating, etc..

Target clip object model is to have built up in advance, which includes multiple and different mesh Clip object associative classification model is marked, each target clip object associative classification model can be identified from video data to be clipped Out with the associated corresponding key feature of target clip object, multiple and different target clip object associative classification models can know Key feature that Chu be not multiple and different, target clip object model is according to multiple and different key features from video data to be clipped Middle determining target clip object key frame.

Wherein, target clip object model determines target according to multiple and different key features from video data to be clipped Clip object key frame can specifically have both sides to apply, be described as follows：

On the one hand, in the prior art editing video data when, convert text data for voice data, utilize image data It is trained with the text data after conversion, establishes editing model；By converting text for voice data among the editing model There is distortion in the roundabout process of notebook data, the input data of editing model, there is also not for the result that final editing comes out Accurate problem；Or only classified by the disaggregated model after image training, do not utilize other features in addition to image Model is constructed, final editing resultant error is bigger.

A concrete application is in the application, simultaneously includes that this is more in the front and back video data of target clip object key frame In other words a different key feature includes multiple correspondence in multiple single-frame images before and after target clip object key frame Key feature, that is, pass through this multiple and different key feature occurred simultaneously, to analyze and determine that target clip object is crucial Frame, it is targeted strong this can enable clip object diminution, the accuracy of editing is improved, therefore, in this way, is It improves editing accuracy and technical support is provided.

On the other hand, the prior art is most of only by image training clip object or indirect utilization voice data, or Person only positions sportsman and ball, and such clip object is relatively simple, and content is not enriched, cannot be simultaneously in the multiple editings of editing Hold.

Another concrete application is in the application, as long as there are any in multiple key features in video data to be clipped One, it is determined that corresponding target clip object key frame, this can enable the expansions of the range of editing, and clip object is abundant, lead to It crosses target clip object model and target editing pair is determined from video data to be clipped according to multiple and different key features As key frame, technical support is provided to improve the rich of clip object.

The multiple single-frame images in front and back for finally saving target clip object key frame, obtain target editing video data, can So that the video data of editing is more coherent, make the content more horn of plenty of editing.

Video data to be clipped is obtained in the application embodiment；By established target clip object model from described The target clip object key frame of requirement is determined for compliance in video data to be clipped, wherein the target clip object model packet Multiple and different target clip object associative classification models is included, each target clip object associative classification model is used for from institute State identified in video data to be clipped with the associated corresponding key feature of target clip object so that the target editing pair As model determines that the target clip object is closed according to multiple and different key features from the video data to be clipped Key frame；The multiple single-frame images in front and back for saving the target clip object key frame, obtain target editing video data.Due to more A target clip object associative classification model can identify associated more with target clip object from video data to be clipped A corresponding key feature, target clip object model is according to multiple and different key features from video data to be clipped , can be there are two types of possible mode when determining target clip object key frame, one is more before and after target clip object key frame It include multiple corresponding key feature in a single-frame images, it is targeted strong this can enable clip object diminution, it improves The accuracy of editing, by target clip object model according to multiple and different key features from video data to be clipped It determines target clip object key frame, provides technical support to improve editing accuracy；As long as another kind is video counts to be clipped There are any one of multiple key features in, it is determined that corresponding target clip object key frame, this can enable cut Volume range expand, clip object is abundant, by target clip object model according to multiple and different key features to Target clip object key frame is determined in editing video data, provides technical support to improve the rich of clip object.

In one embodiment, key feature include occur in image target clip object, image over-the-counter sound in go out Text appearance and in the associated instant text of target clip object, image first in existing target clip object Key Words, image Personage occurs occurring cutting with target simultaneously with the second personage in the associated facial expression of target clip object and sound, image simultaneously Collect at least two in the associated movement of object.

Above-mentioned key feature is the key feature often occurred in video data, generally also related to highlight, and The more interested feature of user, can sufficiently meet user demand；On the one hand it can help improve the accuracy of editing, separately On the one hand it also can contribute to abundant clip content.

Further, in a specific embodiment, video data to be clipped is football video to be clipped, target editing Object is that football enters net, and key feature includes：Occur in image football enter net, occur scoring in announcer it is crucial There is instant score in text in language, image, football fan's expression is excited in image and sound is loud and sonorous, sportsman and coach occur in image Celebration movement.

If target clip object model has built up, can directly it use, if target clip object model is not It establishes, then it can be first good by target clip object model foundation.In one embodiment, the foundation of target clip object model Journey may refer to Fig. 2, and this method further includes：

Step S201：Multiple and different target clip object associative classification models is established by machine learning.

Step S202：According to multiple and different target clip object associative classification model foundation target clip object models.

Machine learning (Machine Learning, ML) is a multi-field cross discipline, be related to probability theory, statistics, The multiple subjects such as Approximation Theory, convextiry analysis, algorithm complexity theory；Specialize in the study that the mankind were simulated or realized to computer how Behavior reorganizes the existing structure of knowledge and is allowed to constantly improve the performance of itself to obtain new knowledge or skills；It is people The core of work intelligence, is the fundamental way for making computer have intelligence, and application spreads the every field of artificial intelligence.

In present embodiment, multiple and different target clip object associative classification models is established by machine learning, then Further according to the model that multiple and different target clip object associative classification model foundations is final, i.e. target clip object model, lead to This mode is crossed, the time of building target clip object model can be substantially reduced, and can intelligently improve the performance of itself, is made Final editing result is more met the needs of users.

Further, in step s 201, multiple and different target clip object associative classification moulds is established by machine learning Type can specifically include：Establish multiple and different target clip object associative classification models respectively using AdaBoost classifier.

Adaboost is a kind of iterative algorithm, and core concept is the classifier different for the training of the same training set (Weak Classifier) then gets up these weak classifier sets, constitutes a stronger final classification device (strong classifier).It is calculated Method itself is realized by changing data distribution, and whether it is correct according to the classification of each sample among each training set, And the accuracy rate of the general classification of last time, to determine the weight of each sample.The new data set for modifying weight is given down Layer classifier is trained, the Multiple Classifier Fusion for finally obtaining each training, as last Decision Classfication device.It uses Adaboost classifier can exclude some unnecessary training data features, and be placed on above crucial training data.

General machine-learning process include sample data acquisition, sample production, sample characteristics extraction, classifier design, The processes such as data training, parameter adjustment, test result feedback, actual test.The application in one embodiment, picture training sample Original training, can simplify machine-learning process.Fig. 3 is referred to, is established respectively using AdaBoost classifier multiple and different Target clip object associative classification model, can specifically include：

Step S301：Obtain and mark predetermined quantity, the picture training sample including there is key feature.

Step S302：Sample characteristics extraction is carried out to picture training sample, establishes the feature vector square of picture training sample Battle array.

Step S303：By the eigenvectors matrix input AdaBoost classifier of picture training sample to be trained, obtain To not after tested, meet the initial target clip object associative classification model of preset requirement.

Step S304：Using the video measurement test sample including there is key feature and adjust initial target editing Object associative classification model obtains the target clip object associative classification model for meeting preset requirement.

The above process can identify that football flies into the key feature of net and is with target clip object associative classification model Example, to be described in detail：

(1) data acquisition and sample manufacturing process：It acquires a large amount of football and flies into net picture as shown in figure 4, these are drawn Occur football and net in face, and football has flown into net, while a large amount of non-footballs also being taken to fly into net picture, such as Fig. 5 It is shown, do not occur net or football in these samples, the feature of net is also entered without football.Picture training will be collected Sample carries out dimension normalization and label.

(2) sample characteristics extraction process：Football, which can be summed up, by observation chart 4 and the feature of image of Fig. 5 flies into net Key features：Picture has football, also there is a net, and the distance of football and net is closer, and being reflected in two dimensional image is exactly football Pixel and net pixel apart from closer, can be by color and straight in addition to this there are also the doorframe and door line of white Line detects frame out and white door line and doorframe occurs.Football available in this way flies into the sample characteristics of net, can build later Vertical eigenvectors matrix (is infused as shown in table 1：Numerical value in eigenmatrix vector is in order to illustrate method, with actual measurement data Have deviation), it can intuitively find out that different feature combinations indicates that the football of the picture flies into net according to eigenvectors matrix Probability.

1 eigenvectors matrix of table

(3) AdaBoost classifier basic principle：AdaBoost classifier Theory comparison is mature, and in Face datection and knows Effective practice is obtained in other isotype identification classification.This method allows to continually add new " Weak Classifier ", until reaching Some scheduled sufficiently small error rate.In AdaBoost method, each picture training sample is endowed a weight, Show that it is selected into the probability of training set by some classifier.If some sample is by Accurate classification, under construction one In a training set, its selected probability is just lowered；On the contrary, if some sample is not classified correctly, its power Weight is just improved.In this way, AdaBoost method " can focus on " more difficult (the richer letter of those classification Breath) sample on.These weak detectors are only better than random guess, and the conjecture for two class problems only than 50% is good A bit.But the very weak Multiple Classifier Fusion of these detectabilities is got up by certain algorithm, a classification capacity will be obtained Very strong strong classifier.It should be noted that being not limited to AdaBoost classifier here, support vector machines also can choose The others such as (Support Vector Machine, SVM) classifier, it will not be described here.

The algorithm flow of AdaBoost：

Input：Training dataset T={ (x₁,y₁),(x₂,y₂)......(x_N,y_N), whereiny∈Y =-1,1, iteration total degree M；

1, the weight distribution of training sample is initialized：

2, for the number of iterations m=1,2 ..., M：

(a) D is distributed using with weight_mTraining dataset learnt, obtain Weak Classifier G_m(x)；

(b) G is calculated_m(x) D is distributed in training data centralized value_mOn error in classification rate e_m：

(c) G is calculated_m(x) weight α shared in final classification device_m：

(d) weight for updating training dataset is distributed (z here_mNormalization factor, in order to make sample probability distribution and For 1)：

3, final classification device is obtained：

(4) training and adjustment process：Picture training sample is divided into training sample and test sample, training sample is mainly For carrying out classifier study, test sample is mainly to detect whether classification learning parameter meets the requirements.It first will training sample In this feeding classifier, according to the process of classifier, the iterative characteristic extraction of data is carried out, characteristic parameter compares, the spy of iteration Sign parametric classification threshold calculations, sample such as reclassify at the processes.Later, by the calculated result parameter of these processes in test specimens Characteristic vector pickup, characteristic parameter sample are carried out in sheet and the processes such as reclassifies, and finally obtain the accuracy and mistake of sample judgement Accidentally rate, if accuracy and error rate meet preset requirement, for example require the correct probability of classification 95% or more, then classifying Device study is completed；, whereas if test result accuracy is lower than 95%, then to readjust the parameter setting of classifier or add New characteristic attribute etc. is added in the quantity of large sample.

(5) actual test process：Above-mentioned classification only completes learning test process on limited sample set, one at The classifier of function also needs to be tested in real data, and the calculated result parameter of these processes is enterprising in real data Row characteristic vector pickup, characteristic parameter sample such as reclassify at the processes, finally obtain the accuracy and error rate of sample judgement, such as Fruit accuracy and error rate meet preset requirement, for example require the correct probability of classification 95% or more, then classifier It practises and completing；, whereas if test result accuracy is lower than 95%, then to readjust the parameter setting of classifier or increase sample New characteristic attribute etc. is added in this quantity.

In one embodiment, in step S202, according to multiple and different target clip object associative classification model foundations Target clip object model, can specifically include：

Based on decision-tree model, joins together multiple and different target clip object associative classification models to establish target and cut Collect object model.

Decision tree (Decision Tree) is a kind of simple but widely used classifier；It is constructed by training data Decision tree can efficiently classify to unknown data.Decision number has two big advantages：(1) decision-tree model can the property read It is good, have descriptive, facilitates manual analysis；(2) high-efficient, decision tree only needs once to construct, Reusability, each time in advance The max calculation number of survey is no more than the depth of decision tree.

It is the structural schematic diagram of one embodiment of the application video clipping device referring to Fig. 6, Fig. 6, which includes：Processing Device 1, memory 2 and telecommunication circuit 3, processor 1 are respectively coupled to memory 2 and telecommunication circuit 3, processor 1, memory 2 with And telecommunication circuit 3 can be realized the step in as above any one the method at work.The detailed description of related content please be joined See above method part, it is no longer superfluous herein to chat.

It is the structural schematic diagram for one embodiment of device that the application has store function, the device 10 referring to Fig. 7, Fig. 7 On be stored with program data 100, the step any one of as above in method is realized when program data 100 is executed by processor.It is related The detailed description of content refers to above method part, no longer superfluous herein to chat.

The technical solution of the application substantially the part that contributes to existing technology or the technical solution in other words Completely or partially can be embodied in the form of software products, this has the device of store function, including several program datas, With so that a computer equipment (can be personal computer, server or the network equipment etc.) or processor (processor) all or part of the steps of each embodiment the method for the application is executed.And it is above-mentioned with storage function Can device include：USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), etc..

The foregoing is merely presently filed embodiments, are not intended to limit the scope of the patents of the application, all to utilize this Equivalent structure or equivalent flow shift made by application specification and accompanying drawing content, it is relevant to be applied directly or indirectly in other Technical field similarly includes in the scope of patent protection of the application.

Claims

1. a kind of video clipping method, which is characterized in that the method includes：

Obtain video data to be clipped；

The target editing of requirement is determined for compliance with from the video data to be clipped by established target clip object model Object key frame, wherein the target clip object model includes multiple and different target clip object associative classification model, often A target clip object associative classification model from the video data to be clipped for identifying and target clip object Associated corresponding key feature, so that the target clip object model is according to multiple and different key features from described The target clip object key frame is determined in video data to be clipped；

The multiple single-frame images in front and back for saving the target clip object key frame, obtain target editing video data.

2. the method according to claim 1, wherein the key feature includes occurring target editing pair in image As, image over-the-counter sound in there are target clip object Key Words, being associated with target clip object occurs in the text in image Instant text, the first personage occurs and in the associated facial expression of target clip object and sound, image the simultaneously in image Two personages occur and at least two in the associated movement of target clip object simultaneously.

3. according to the method described in claim 2, it is characterized in that, the video data to be clipped is that football to be clipped regards Frequently, the target clip object is that football enters net, and the key feature includes：Occur football in image and enters net, solution Occur goal Key Words in the person of saying, instant score occurs in the text in image, football fan's expression is excited in image and sound is loud and sonorous, figure There is celebration movement in sportsman and coach as in.

4. the method according to claim 1, wherein the method also includes：

Multiple and different target clip object associative classification models is established by machine learning；

According to multiple and different target clip object associative classification model foundation target clip object models.

5. according to the method described in claim 4, it is characterized in that, described establish multiple and different targets by machine learning and cut Object associative classification model is collected, including：

Establish multiple and different target clip object associative classification models respectively using AdaBoost classifier.

6. according to the method described in claim 5, it is characterized in that, it is described using AdaBoost classifier establish respectively it is multiple not Same target clip object associative classification model, including：

Obtain and mark predetermined quantity, the picture training sample including there is key feature；

Sample characteristics extraction is carried out to the picture training sample, establishes the eigenvectors matrix of the picture training sample；

The eigenvectors matrix of the picture training sample is inputted into the AdaBoost classifier to be trained, obtain without Test, meet the initial target clip object associative classification model of preset requirement；

Using the video measurement test sample including there is key feature and adjust the initial target clip object association Disaggregated model obtains the target clip object associative classification model for meeting preset requirement.

7. according to the method described in claim 6, it is characterized in that, the preset requirement includes default accuracy and default mistake Rate.

8. according to the method described in claim 4, it is characterized in that, described close according to multiple and different target clip objects Connection disaggregated model establishes target clip object model, including：

Based on decision-tree model, join together multiple and different target clip object associative classification models to establish the mesh Mark clip object model.

9. a kind of video clipping device, which is characterized in that described device includes：Processor, memory and telecommunication circuit, it is described Processor is respectively coupled to the memory and the telecommunication circuit, the processor, the memory and the telecommunication circuit It can be realized the step in any one of claim 1-8 the method at work.

10. a kind of device with store function, is stored thereon with program data, which is characterized in that described program data are located Manage the step realized in any one of claim 1-8 the method when device executes.