CN110166828A

CN110166828A - A kind of method for processing video frequency and device

Info

Publication number: CN110166828A
Application number: CN201910122357.XA
Authority: CN
Inventors: 李志成
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-02-19
Filing date: 2019-02-19
Publication date: 2019-08-23

Abstract

The embodiment of the present application discloses a kind of method for processing video frequency and device, for needing to carry out the video to be processed of video demolition, by the content type for determining the video to be processed, determine to correspond to the neural network model of the content type from neural network model, and by the neural network model determined from video to be processed include the content type corresponding image feature video frame, and using this kind of video frame as the demolition node frame of video to be processed, video demolition is carried out to video to be processed according to demolition node frame, obtains multiple video clips.Not only realize the function of carrying out video demolition automatically to video, and, to the video of different content type, demolition node frame is identified using corresponding neural network model, that is, different demolition mode, high-precision video demolition can be reached to the video of different content type, demolition efficiency is improved, the demand of current video timeliness is met.

Description

A kind of method for processing video frequency and device

Technical field

This application involves field of video processing, more particularly to a kind of method for processing video frequency and device.

Background technique

Video demolition is a kind of video processing technique, by carrying out secondary operation to video, by a video according to certain Logic or specific needs split into multiple video clips, and the video clip split out can be used for generating collection of choice specimens video etc..Such as Video demolition can be carried out to traditional tv media program because of the needs of internet video and the short-sighted frequency content platform of new media, A complete programme content originally is incited somebody to action, multiple video clips are split into.

It is many kinds of due to video, it is difficult to unified video demolition rule is set, therefore video demolition side conventional at present Formula is manually to carry out video demolition to video to be processed by some video processing tools, how to split video clip and all relies on Artificial experience.

Current video demolition process flow leads to low efficiency since artificial experience bring influences, it is difficult to meet current The demand of video timeliness.

Summary of the invention

In order to solve the above-mentioned technical problem, this application provides a kind of method for processing video frequency and devices.

The embodiment of the present application discloses following technical solution:

In a first aspect, the embodiment of the present application provides a kind of method for processing video frequency, which comprises

Determine the content type of video to be processed；

By the corresponding neural network model of the content type, determine in the video to be processed as demolition node frame Video frame；It include the corresponding characteristics of image of the content type in the demolition node frame；

Video demolition is carried out to the video to be processed according to the demolition node frame, obtains multiple video clips.

Second aspect, the embodiment of the present application provide a kind of video process apparatus, and described device includes:

First determination unit, for determining the content type of video to be processed；

Second determination unit, for determining the view to be processed by the corresponding neural network model of the content type Video frame in frequency as demolition node frame；It include the corresponding characteristics of image of the content type in the demolition node frame；

Video demolition unit is obtained for carrying out video demolition to the video to be processed according to the demolition node frame Multiple video clips.

The third aspect, the embodiment of the present application provide a kind of video processing equipment, and the equipment includes processor and deposits Reservoir:

Said program code is transferred to the processor for storing program code by the memory；

The processor is used for according to the described in any item views of the above first aspect of instruction execution in said program code Frequency processing method.

Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, the computer-readable storage Medium is for storing program code, and said program code is for executing the described in any item video processing sides of the above first aspect Method.

It, should be to by determining for needing to carry out the video to be processed of video demolition it can be seen from above-mentioned technical proposal The content type for handling video, determines the neural network model for corresponding to the content type, and pass through from neural network model The neural network model determined from video to be processed include the content type corresponding image feature video frame, and will Demolition node frame of this kind of video frame as video to be processed carries out video demolition to video to be processed according to demolition node frame, Obtain multiple video clips.The function of carrying out video demolition automatically to video is not only realized, also, to different content type Video identifies demolition node frame using corresponding neural network model, that is, different demolition mode, to the view of different content type Frequency can reach high-precision video demolition, improve demolition efficiency, meet the demand of current video timeliness.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of application without any creative labor, may be used also for those of ordinary skill in the art To obtain other drawings based on these drawings.

Fig. 1 is a kind of exemplary scene schematic diagram provided by the embodiments of the present application；

Fig. 2 is a kind of flow diagram of method for processing video frequency provided by the embodiments of the present application；

Fig. 3 is a kind of process signal of method for the content type for determining video to be processed provided by the embodiments of the present application Figure；

Fig. 4 is the schematic diagram of a scenario of the content type of one provided by the embodiments of the present application output video to be processed；

Fig. 5 is the schematic diagram of the effect of determining video content types provided by the embodiments of the present application；

Fig. 6 is a kind of flow diagram of method for generating collection of choice specimens video provided by the embodiments of the present application；

Fig. 7 a is a kind of structural schematic diagram of video process apparatus provided by the embodiments of the present application；

Fig. 7 b is a kind of structural schematic diagram of video process apparatus provided by the embodiments of the present application；

Fig. 8 is a kind of structural schematic diagram of video processing equipment provided by the embodiments of the present application；

Fig. 9 is a kind of structural schematic diagram of video processing equipment provided by the embodiments of the present application.

Specific embodiment

In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only this Apply for a part of the embodiment, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art exist Every other embodiment obtained under the premise of creative work is not made, shall fall in the protection scope of this application.

The description and claims of this application and term " first ", " second ", " third ", " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so that embodiments herein described herein for example can be to remove Sequence other than those of illustrating or describe herein is implemented.In addition, term " includes " and " having " and theirs is any Deformation, it is intended that cover it is non-exclusive include, for example, containing the process, method of a series of steps or units, system, production Product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for this A little process, methods, the other step or units of product or equipment inherently.

As previously described, because video is many kinds of, it is difficult to unified video demolition rule be arranged, therefore traditional video is torn open Mode is manually to carry out video demolition to video to be processed by some video processing tools, how to split video clip according to Rely in artificial experience.Therefore, traditional video demolition process flow is influenced due to artificial experience bring, leads to low efficiency, difficult To meet the needs of current video timeliness.

In consideration of it, the embodiment of the present application provides a kind of method for processing video frequency, the content with video to be processed can use The corresponding neural network model of type, determined from video to be processed include the content type corresponding image feature view Frequency frame, and the video frame of the content type corresponding image feature is determined as to the demolition node frame of video to be processed, thus Video demolition is carried out to video to be processed using the demolition node frame, obtains multiple video clips.It can be seen that utilizing this Shen Please embodiment provide method for processing video frequency, can automatically to video to be processed carry out video demolition, without manually by Video processing tools carry out video demolition to video to be processed, avoid since artificial experience leads to low efficiency, it is difficult to meet mesh The problem of demand of preceding video timeliness.

Below in conjunction with scene shown in FIG. 1, method for processing video frequency provided by the embodiments of the present application is simply introduced.

In the embodiment of the present application, video demolition can be carried out to video 101 to be processed shown in FIG. 1 automatically.The application Embodiment does not limit the video to be processed specifically, the video to be processed for example can for news video, entertainment video, Any one in TV play video, film video and other videos.It, can be first when carrying out video demolition to video to be processed The demolition node frame in the video 101 to be processed is first determined, to regard according to the demolition node frame to video 101 to be processed Frequency demolition.The demolition node frame referred in the embodiment of the present application is the cut-point that video demolition is carried out to video 101 to be processed. In the embodiment of the present application, the demolition node frame may include a frame video frame, also may include the continuous video frame of multiframe, The embodiment of the present application does not limit specifically.In the embodiment of the present application, video is carried out to video 101 to be processed according to demolition node frame Demolition, such as can be and the video frame between two adjacent demolition node frames is determined as a video clip, for another example it can be Video frame before some demolition node frame is determined as a video clip, or will be after some demolition node frame Video frame is determined as a video clip, is can be for another example by the video frame deletion before some demolition node frame, or will Video frame deletion, etc. after some demolition node frame.In scene shown in Fig. 1, neural network model can use 102 determine the video frame 103 in video 101 to be processed as demolition node frame.It is determined as the video frame 103 of demolition node frame Later, then it can use the video frame 103 as demolition node frame and video demolition carried out to video 101 to be processed, obtain Multiple video clips 104.

For example, the content type of video 101 to be processed is TV play, video 101 to be processed includes that 100 frames regard altogether Frequency frame is ranked up after arriving first, respectively video according to the sequence that the video frame occurs in the video 101 to be processed Frame 1, video frame 2, video frame 3 ... video frame 100, wherein 20th frame to being and TV play between the 75th frame in video to be processed Unrelated advertisement picture.The video frame that neural network model 102 corresponding with TV play is determined as demolition node frame is video Frame 20 and video frame 75 then carry out video demolition to video 101 to be processed using demolition node frame, obtain multiple video clips, This multiple video clip is respectively the video clip for including multiple video frames.Wherein, first video clip be by video frame 1 to The video clip that video frame 19 is constituted, second video clip are the piece of video that is made of video frame 76 to video frame 100 Section, and the video clip that video frame 20 to video frame 75 is constituted is deleted, and realizes video 101 to be processed splitting into two Video clip.It is that video to be processed 101 above is through available 3 video clips of demolition in embodiment another, first Video clip is the video clip being made of video frame 1 to video frame 19, and second video clip is by video frame 76 to view The video clip that frequency frame 100 is constituted, third video clip are the video clip that is made of video frame 20 to video frame 75 And it will not be deleted.

It should be noted that consider the video of different content type, the corresponding video frame of corresponding demolition node frame Corresponding characteristics of image is different.Therefore, the neural network model 102 in the embodiment of the present application is and the video to be processed The corresponding neural network model of 101 content type, the neural network model 102 can be determined to carry the video to be processed Content type corresponding to characteristics of image, and the corresponding video frame of described image feature will be carried, be determined as described to be processed The demolition node frame of video 101.

The content type of the video to be processed referred in the embodiment of the present application refers to the content institute of the video to be processed The classification of category.The embodiment of the present application does not limit the content type of the video to be processed specifically, as an example, described wait locate The content type for managing video, can be news political situation of the time class, entertainment class, TV play film class, automotive-type, animal class, mother and baby Any one in class, sport category and other classes etc..

It should be noted that the embodiment of the present application does not limit the neural network model 102, the neural network mould specifically Type 102 for example can be convolutional neural networks (Convolutional Neural Networks, CNN) model.

Method for processing video frequency provided by the present application is introduced below by way of specific embodiment.

Referring to fig. 2, which is a kind of flow diagram of method for processing video frequency provided by the embodiments of the present application.

Method for processing video frequency provided by the embodiments of the present application, such as can be realized with S201-S203 as follows.

S201: the content type of video to be processed is determined.

It, can be with reference to phase above inside the Pass about the description section of the video to be processed and the content type of video to be processed The description of appearance, details are not described herein again.

It is understood that the video of different content type, the spy of image information entrained by the video frame for being included Point may be different, and the image information of certain features may be carried in the video frame that the video of certain types of content is included.Cause This can be to image entrained by video frame included by video to be processed in a kind of implementation of the embodiment of the present application Information is analyzed, with the content type of the determination video to be processed.In another implementation of the embodiment of the present application, Video to be processed can be watched by full-time worker and determine the content class of video to be processed by the way of manually determining Type.In another implementation of the embodiment of the present application, video to be processed can be determined by the label of the video to be processed Content type.Wherein, the label of video to be processed can be what user marked in advance.For example, video to be processed is on user The video of biography, user upload the content type that the video timestamp to be processed has infused video to be processed.

S202: it by the corresponding neural network model of the content type, determines in the video to be processed as demolition The video frame of node frame；It include the corresponding characteristics of image of the content type in the demolition node frame.

In the embodiment of the present application, it can use neural network model to determine in video to be processed as demolition node frame Video frame.In view of the video of different content type, image corresponding to the corresponding video frame of corresponding demolition node frame is special Sign is different, to be difficult with the demolition node frame that a neural network model determines the video of various content types.Therefore, exist In the embodiment of the present application, the neural network model of the content type corresponding to the video to be processed can be used, determine wait locate Manage the video frame in video as demolition node frame.Exactly because the neural network model corresponds to the video to be processed Content type, therefore, in the demolition node frame determined using the neural network model include that the content type is corresponding Characteristics of image.

It should be noted that the embodiment of the present application does not limit the corresponding characteristics of image of the content type specifically, such as above Described, the content type of the video to be processed can be news political situation of the time class, entertainment class, TV play film class and other Any one in class etc..Corresponding, the corresponding characteristics of image of the content type also may include that news political situation of the time class is corresponding Characteristics of image, the corresponding characteristics of image of entertainment class, the corresponding characteristics of image of TV play film class and other classes it is corresponding Characteristics of image in any one.

It is understood that therefore, working as institute in general, will appear instructor in broadcasting's platform picture mostly in news political situation of the time class video When the content type for stating video to be processed is news political situation of the time class, the corresponding characteristics of image of the content type can be drawn for instructor in broadcasting's platform Face.It is possible that some advertisement pictures and some with entertainment type mismatch picture in entertainment class video, therefore, When the content type of the video to be processed is entertainment class, the corresponding characteristics of image of the content type can be advertisement Picture or the picture not being inconsistent with the content type.It will appear video title, piece mostly in general TV play movie video Tail picture, therefore, when the content type of the video to be processed is TV play film class, the corresponding image of the content type Feature can be video title, run-out picture.And for can generally have people in other kinds of video such as sport category video Face, therefore, when the content type of the video to be processed is other classes, the corresponding characteristics of image of the content type can be Target face picture.That is, in the embodiment of the present application, the corresponding image of the content type of the video to be processed is special Sign, may include instructor in broadcasting's platform picture, advertisement picture, the picture not being inconsistent with the content type, video title run-out picture and Any one in target face picture.

S203: video demolition is carried out to the video to be processed according to the demolition node frame, obtains multiple video clips.

After determining the demolition node frame, it can be torn open using the demolition node frame as to video to be processed progress video The video to be processed is carried out video demolition, obtains multiple video clips by the cut-point of item.

For example, news political situation of the time class video to be processed can use the corresponding neural network model of news political situation of the time class and be known Not Chu instructor in broadcasting's platform, using instructor in broadcasting's platform picture as demolition node frame, and identify the master in video to be processed in conjunction with face recognition algorithms Hold people, in conjunction with ASR/OCR recognition result, using TextRank to the voice and word content in video to be processed at Reason, obtains video frequency abstract, generates title and abstract according to the video clip that video frequency abstract is obtained to video demolition automatically.Wherein, ASR refers to that the vocabulary Content Transformation in the voice by the mankind is the technology of text；OCR refers to using GMM-HMM/DNN technology, incites somebody to action The technology that word content on picture, photo is identified and extracted.

For another example, video to be processed for entertainment class can delete advertisement and other and joy in video to be processed The content of music program class video is not consistent content specifically can be in conjunction with the aforementioned advertisement of OCR and ASR associative key deletion And other contents with entertainment class video are not consistent content.

For another example, for TV play film class video to be processed, demolition can be determined in conjunction with teaser or tail and target face Node frame accurately determines the time point that head, run-out and target face occur, and deletes head and run-out, and generate Video clip comprising target face can accomplish the video capability for only seeing or not seeing target face in TV play and film.

For another example, video to be processed for other classes, can be according to user demand, in conjunction with recognition of face, OCR, ASR and target The technologies such as detection technique do relevant treatment, obtain the video clip for meeting user demand.

As can be seen from the above description, using method for processing video frequency provided by the embodiments of the present application, for needing to carry out video The video to be processed of demolition determines corresponding be somebody's turn to do by determining the content type of the video to be processed from neural network model The neural network model of content type, and determine to include the content class from video to be processed by the neural network model The video frame of type corresponding image feature, and using this kind of video frame as the demolition node frame of video to be processed, according to demolition section Point frame carries out video demolition to video to be processed, obtains multiple video clips.It not only realizes and carries out video automatically to video and tear open The function of item, also, to the video of different content type, it is identified using corresponding neural network model, that is, different demolition mode Demolition node frame can reach high-precision video demolition to the video of different content type, improve demolition efficiency, meet The demand of video timeliness at present.

As described in S201 above, in the embodiment of the present application, video frame included by video to be processed can be taken The image information of band is analyzed, with the content type of the determination video to be processed.A kind of determination is introduced below in conjunction with attached drawing The implementation of the content type of video to be processed.

Referring to Fig. 3, which is a kind of method of content type for determining video to be processed provided by the embodiments of the present application Flow diagram.

The method of the content type of determination video to be processed provided by the embodiments of the present application, can be as follows S301-S302 is realized.

S301: according to the first image neural network model, determine that multiple video frames respectively correspond in the video to be processed Feature vector, described eigenvector carries the image information for including in corresponding video frame.

S302: the content class of the video to be processed is determined according to the corresponding feature vector of the multiple video frame Type.

It should be noted that the first image neural network model referred in the embodiment of the present application, can be convolutional Neural Network model.The first image neural network model in the embodiment of the present application can be the image generated using multitude of video frame It is obtained as training sample training.The first image neural network model, can extract the video frame pair in video to be processed The feature vector answered.Wherein, the corresponding feature vector of a video frame can carry image letter included in the video frame Breath.That is, the first image neural network model, can determine the image letter for including in the video frame in video to be processed Breath.

In the embodiment of the present application, the picture input described that video frames multiple in the video to be processed can be generated One image neural network model, by the place of the convolutional layer of the first image neural network model, filter layer and pond layer Reason, exports the corresponding feature vector of the multiple video frame.

It, to a certain extent can be with it is understood that image information included in video frame in video to be processed The image type of the video to be processed is embodied, therefore, can use the corresponding spy of multiple video frames in video to be processed Vector is levied, determines the content type of video to be processed.

In the embodiment of the present application, it is contemplated that image information included by a video frame in video to be processed, it may It is not enough to embody the content type of the video to be processed, that is to say, that the corresponding spy of a video frame in video to be processed Levy vector, it may not be possible to determine the content type of video to be processed.Therefore, in the embodiment of the present application, can use described First image neural network model determines the corresponding feature vector of multiple video frames in video to be processed, and described in utilizing The corresponding feature vector of multiple video frames determines the content type of the video to be processed.In other words, utilization is the multiple Included image information, determines the content type of the video to be processed in video frame.

In practical applications, it is however generally that, utilize image information included in all videos frame in video to be processed, foot To determine the content type of the video to be processed, therefore, the multiple video frames referred in the embodiment of the present application may include All videos frame in the video to be processed.In practical applications, using included in part video frame in video to be processed Image information, may also be enough to determine therefore the content type of the video to be processed refers in the embodiment of the present application Multiple video frames may include the partial video frame in the video to be processed.

S302 in specific implementation, such as can analyze the corresponding feature vector of the multiple video frame, So that it is determined that the content type of the video to be processed.

In the embodiment of the present application, it is contemplated that the relevance in the video to be processed, between continuous multiple frames video frame May be bigger, in other words, the correlation in continuous multiple frames video frame between included picture material may also be relatively high.Cause This, can if the correlation between continuous multiple frames video frame can be taken into account when determining the content type of video to be processed The content type of video to be processed is enough determined more accurately out.

In consideration of it, S302 in specific implementation, can pass through in a kind of possible implementation of the embodiment of the present application Following steps A-B is realized.

Step A: according to timing of the multiple video frame in the video to be processed, the multiple video frame is distinguished Corresponding feature vector composition characteristic sequence vector.

Step B: according to described eigenvector sequence, the video to be processed is determined by the second image neural network model Content type.

About step A, it should be noted that in order to enable can will connect when determining the content type of video to be processed Correlation between continuous multi-frame video frame takes into account, in the embodiment of the present application, according to the multiple video frame it is described to The timing in video is handled, by the corresponding feature vector composition characteristic sequence vector of the multiple video frame, so, It can be by the correlation between continuous multiple feature vectors in analysis described eigenvector sequence, by the continuous multiple frames video Correlation between frame takes into account.

In the embodiment of the present application, timing of the multiple video frame in the video to be processed is referred to the multiple The sequencing that video frame occurs in the video to be processed.In the embodiment of the present application, include in institute's characteristic vector sequence Multiple feature vectors, the multiple feature vector and the multiple video frame correspond, multiple in described eigenvector sequence The sequence of feature vector, sequence phase of the video frame corresponding with the multiple feature vector in the video to be processed Together.For example, video frame 1 is to the corresponding feature vector 1 of video frame 100 to feature vector 100, video frame 1 is described The time occurred in video to be processed is earliest, and the time that video frame 2 occurs in the video to be processed takes second place, video frame 100 The time of appearance is the latest.Then the sequence of 100 feature vectors in characteristic vector sequence is feature vector 1, feature vector 2 ... Feature vector 100.

In the embodiment of the present application, after obtaining described eigenvector sequence, the second image neural network mould can be passed through Type determines the content type of the video to be processed.

The embodiment of the present application does not limit specifically and searches book the second image neural network model, it is contemplated that Recognition with Recurrent Neural Network The input of (Recurrent Neural Network, RNN) is characteristic sequence, and can by multiple features in characteristic sequence into Row association, in consideration of it, in the embodiment of the present application, the second image neural network model may include Recognition with Recurrent Neural Network mould Type.

The embodiment of the present application does not limit the content class that the second image neural network model determines the video to be processed specifically The specific implementation of type, as an example, the second image neural network model can use multilayer perceptron (Multi-Layer Perceptron, MLP) handles described eigenvector sequence, exports the interior of the video to be processed Hold type.

In the embodiment of the present application, the second image neural network model exports the content type of video to be processed, real It is the probability that the output video to be processed belongs to each content type on border.Reference can be made to Fig. 4 is understood, Fig. 4 is the application The schematic diagram of a scenario of the content type for the output video to be processed that embodiment provides.By 402 in Fig. 4 it is found that wait locate The content type of reason video 401 is that the probability of " automobile " is 0.99959, is 0.00009, is " dynamic for the probability of " amusement fashion " The probability of object ", which is 0.00008, is the probability of " mother and baby " is 0.00007 and be the probability of " sport " is 0.00006.

It should be noted that, although the content type of video shown in Fig. 4 includes automobile, amusement fashion, animal, mother and baby And sport, but this is signal type explanation, does not constitute the restriction to the embodiment of the present application.

By actual tests data it is found that determining the content type of video to be processed using the first image neural network model When, if video to be processed is the videos such as the obvious video, such as game, football, basketball, animation of pictorial feature, then correspond to Accuracy rate 99% or more.View to be processed is determined using the first image neural network model and the second image neural network model The content type of frequency, when video to be processed is the more dispersed video of pictorial feature, such as TV play, outdoor sports, cuisines, trip Whens trip etc., corresponding accuracy rate is 85% or so.Specifically, it may refer to Fig. 5 to be understood, Fig. 5 mentions for the embodiment of the present application The schematic diagram of the effect of the determination video content types of confession.

In Fig. 5,501 be to be determined in video using the first image neural network model and the second image neural network model Hold type accuracy rate curve graph, 502 for using the first image neural network model determine video content type it is accurate The curve graph of rate.Abscissa in Fig. 5 indicates that obtaining the first image neural network model using transfer learning training corresponds to Frequency of training, also illustrate that training obtain by the first image neural network model and the second image neural network model The corresponding frequency of training of image neural network model of composition, ordinate indicates accuracy rate, wherein the training of each transfer learning Sample includes several pictures such as 10240 pictures.As it can be seen that either utilizing the first image neural network model and the second figure As neural network model determines the content type of video, or the content class of video is determined using the first image neural network model Type, accuracy rate are relatively high.

In practical applications, after a video being carried out video demolition, obtaining multiple video clips can be used for generating Collection of choice specimens video.Currently, video clip can be generated collection of choice specimens video by the way of manually participating in.It specifically, can be by working Personnel watch each video clip, judge whether the video clip belongs to collection of choice specimens segment, and collection of choice specimens segment is synthesized collection of choice specimens view Frequently.But this synthesis collection of choice specimens video mode, the judgement of one side collection of choice specimens segment depend on the subjective judgement shadow of staff It rings, the judging result of collection of choice specimens segment may inaccuracy.On the other hand, need staff that the sight of each video clip is read through it Afterwards, collection of choice specimens video could be generated, efficiency is relatively low.

In the embodiment of the present application, the video to be processed is subjected to video demolition, after obtaining multiple video clips, also It can judge automatically whether video clip belongs to collection of choice specimens segment, and generate collection of choice specimens video.The application reality is introduced below in conjunction with attached drawing The method that the generation collection of choice specimens video of example offer is provided.

Referring to Fig. 6, which is a kind of flow diagram of method for generating collection of choice specimens video provided by the embodiments of the present application.

The method provided by the embodiments of the present application for generating collection of choice specimens video, such as can be with the reality of S601-S602 as follows It is existing.

S601: determine that the multiple video clip is belonging respectively to the weighted value of collection of choice specimens segment according to probabilistic model.

In the embodiment of the present application, the weighted value of a video clip belongs to collection anthology film for characterizing the video clip A possibility that section, belongs to the probability of collection collection of choice specimens segment for characterizing the video clip in other words.

The probabilistic model of the embodiment of the present application can determine that video clip belongs to the weighted value of collection of choice specimens segment.The application is real It applies example and does not limit the probabilistic model specifically, the core algorithm of the probabilistic model for example can be maximum likelihood algorithm.At this Apply in embodiment, it is contemplated that the video of different content type, image, semantic information entrained by corresponding collection of choice specimens segment may Difference, so that being difficult with a general probabilistic model determines whether the video clip of various content types belongs to anthology film Section.Therefore, in a kind of possible implementation of the embodiment of the present application, the probabilistic model can be described to be processed for correspondence The probabilistic model of the content type of video.In the embodiment of the present application, the probabilistic model can be based on the content type What middle history collection of choice specimens segment training obtained, therefore, which can determine that the multiple video clip belongs to anthology film The weight of section.

It should be noted that in the embodiment of the present application, image, semantic information entrained by a video clip refers to The meaning of content expressed by image in the video clip.For example, video clip is the piece in Basketball Match into three-pointer Section, then the image, semantic information that the video clip carries are Three-point Ball in Basketball.

S602: it is generated according to the video clip that weighted value in the multiple video clip meets preset condition described to be processed The corresponding collection of choice specimens video of video.

As described in S601, the weighted value of a video clip belongs to collection collection of choice specimens segment for characterizing the video clip Possibility.Therefore, weighted value meets the video clip of preset condition in the multiple video clip, such as can be the multiple Weighted value is greater than the video clip of preset threshold in video clip, and the embodiment of the present application does not limit the preset threshold, institute specifically Stating preset threshold can be determines according to actual conditions.For example, the preset threshold can be 0.80.

It, can be by the weight after determining that weighted value meets the video clip of preset condition in the multiple video clip The video clip that value meets preset condition synthesizes collection of choice specimens video.

In the embodiment of the present application, it is contemplated that the corresponding collection of choice specimens segment of the video of same content type, the image carried Semantic information may be more similar, for example, can be Three-point Ball in Basketball, football for the collection of choice specimens segment of sport category video The segments such as goal.Therefore, in a kind of implementation of the embodiment of the present application, if probabilistic model described in S601 is corresponding institute The probabilistic model of content type is stated, and the probabilistic model is obtained according to history collection of choice specimens segment training in the content type , then S602 in specific implementation, probabilistic model image, semantic information according to entrained by the multiple video clip with The similarity degree of semantic information entrained by the history collection of choice specimens segment determines that the multiple video clip is belonging respectively to collection of choice specimens segment Weighted value.

Specifically, the probabilistic model can extract image, semantic information entrained by the multiple video clip, and with Semantic information entrained by history collection of choice specimens segment is compared, determine image, semantic information entrained by the multiple video clip with The similarity degree of semantic information entrained by the history collection of choice specimens segment.In the embodiment of the present application, entrained by the video clip Image, semantic information and the similarity degree of semantic information entrained by history collection of choice specimens segment it is higher, then the video clip belongs to collection The weighted value of bright and beautiful segment is higher, correspondingly, image, semantic information entrained by the video clip is taken with history collection of choice specimens segment The similarity degree of the semantic information of band is lower, then the video clip belong to collection of choice specimens segment weighted value it is lower.

It can be seen that can use probabilistic model using method provided by the embodiments of the present application and automatically determine out piece of video Whether section belongs to collection of choice specimens segment, and the determination result of collection of choice specimens segment is accurate and high-efficient, the corresponding efficiency for generating collection of choice specimens video It is relatively high.Solve judging result inaccuracy and low efficiency that traditional artificial mode generates collection of choice specimens segment existing for collection of choice specimens video Problem.

Based on a kind of method for processing video frequency that previous embodiment provides, the present embodiment provides a kind of video process apparatus 700, Referring to Fig. 7 a, which is a kind of structural schematic diagram of video process apparatus provided by the embodiments of the present application.Described device 700 includes First determination unit 701, the second determination unit 702 and video demolition unit 703.

First determination unit 701, for determining the content type of video to be processed；

Second determination unit 702, for determining described to be processed by the corresponding neural network model of the content type Video frame in video as demolition node frame；It include the corresponding characteristics of image of the content type in the demolition node frame；

Video demolition unit 703 is obtained for carrying out video demolition to the video to be processed according to the demolition node frame To multiple video clips.

Optionally, first determination unit 701, is specifically used for:

According to the first image neural network model, the corresponding feature of multiple video frames in the video to be processed is determined Vector, described eigenvector carry the image information for including in corresponding video frame；

The content type of the video to be processed is determined according to the corresponding feature vector of the multiple video frame.

Optionally, described to be determined in the video to be processed according to the corresponding feature vector of the multiple video frame Hold type, comprising:

It is according to timing of the multiple video frame in the video to be processed, the multiple video frame is corresponding Feature vector composition characteristic sequence vector；

According to described eigenvector sequence, the content of the video to be processed is determined by the second image neural network model Type.

Optionally, referring to Fig. 7 b, which is the structural representation of another video process apparatus provided by the embodiments of the present application Figure.Described device 700 further include: third determination unit 704 and collection of choice specimens video generation unit 705.

Third determination unit 704, for determining that the multiple video clip is belonging respectively to collection of choice specimens segment according to probabilistic model Weighted value；

Collection of choice specimens video generation unit 705, for meeting the view of preset condition according to weighted value in the multiple video clip Frequency segment generates the corresponding collection of choice specimens video of the video to be processed.

Optionally, the probabilistic model is the probabilistic model of the corresponding content type, and the probabilistic model is according to institute State what history collection of choice specimens segment training in content type obtained, the third determination unit 704 is specifically used for:

Probabilistic model image, semantic information according to entrained by the multiple video clip and the history collection of choice specimens segment The similarity degree of entrained semantic information determines that the multiple video clip is belonging respectively to the weighted value of collection of choice specimens segment.

Optionally, the corresponding characteristics of image of the content type includes any of the following:

Instructor in broadcasting's platform picture；

Advertisement picture；

The picture not being inconsistent with the content type；

Video title, run-out picture；

Target face picture.

As can be seen from the above description, using video process apparatus provided by the embodiments of the present application, for needing to carry out video The video to be processed of demolition determines corresponding be somebody's turn to do by determining the content type of the video to be processed from neural network model The neural network model of content type, and determine to include the content class from video to be processed by the neural network model The video frame of type corresponding image feature, and using this kind of video frame as the demolition node frame of video to be processed, according to demolition section Point frame carries out video demolition to video to be processed, obtains multiple video clips.It not only realizes and carries out video automatically to video and tear open The function of item, also, to the video of different content type, it is identified using corresponding neural network model, that is, different demolition mode Demolition node frame can reach high-precision video demolition to the video of different content type, improve demolition efficiency, meet The demand of video timeliness at present.

The embodiment of the present application also provides a kind of video processing equipments, are situated between with reference to the accompanying drawing to video processing equipment It continues.Shown in Figure 8, the embodiment of the present application provides a kind of video processing equipment 800, which can be server, Bigger difference can be generated because configuration or performance are different, may include one or more central processing units (Central Processing Units, abbreviation CPU) 822 (for example, one or more processors) and memory 832, one or one The storage medium 830 (such as one or more mass memory units) of application program 842 or data 844 stored above.Its In, memory 832 and storage medium 830 can be of short duration storage or persistent storage.The program for being stored in storage medium 830 can be with Including one or more modules (diagram does not mark), each module may include to the series of instructions behaviour in server Make.Further, central processing unit 822 can be set to communicate with storage medium 830, executes and deposits in video processing 800 Series of instructions operation in storage media 830.

Video processing equipment 800 can also include one or more power supplys 826, one or more wired or nothings Wired network interface 850, one or more input/output interfaces 858, and/or, one or more operating systems 841, Such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..

The step as performed by server can be based on the server architecture shown in Fig. 8 in above-described embodiment.

Wherein, CPU 822 is for executing following steps:

Determine the content type of video to be processed；

Shown in Figure 9, the embodiment of the present application provides a kind of video processing equipment 900, which can also be Terminal device, the terminal device can be include mobile phone, tablet computer, personal digital assistant (PersonalDigital Assistant, abbreviation PDA), point-of-sale terminal (Point of Sales, abbreviation POS), any terminal device such as vehicle-mounted computer, with Terminal device is for mobile phone:

Fig. 9 shows the block diagram of the part-structure of mobile phone relevant to terminal device provided by the embodiments of the present application.Ginseng Fig. 9 is examined, mobile phone includes: radio frequency (Radio Frequency, abbreviation RF) circuit 910, memory 920, input unit 930, display Unit 940, sensor 950, voicefrequency circuit 960, Wireless Fidelity (wireless fidelity, abbreviation WiFi) module 970, place Manage the components such as device 980 and power supply.It will be understood by those skilled in the art that handset structure shown in Fig. 9 does not constitute opponent The restriction of machine may include perhaps combining certain components or different component layouts than illustrating more or fewer components.

It is specifically introduced below with reference to each component parts of the Fig. 9 to mobile phone:

RF circuit 910 can be used for receiving and sending messages or communication process in, signal sends and receivees, particularly, by base station After downlink information receives, handled to processor 980；In addition, the data for designing uplink are sent to base station.In general, RF circuit 910 Including but not limited to antenna, at least one amplifier, transceiver, coupler, low-noise amplifier (Low Noise Amplifier, abbreviation LNA), duplexer etc..In addition, RF circuit 910 can also by wireless communication with network and other equipment Communication.Any communication standard or agreement, including but not limited to global system for mobile communications can be used in above-mentioned wireless communication (Global System of Mobile communication, abbreviation GSM), general packet radio service (General Packet Radio Service, abbreviation GPRS), CDMA (Code Division Multiple Access, referred to as CDMA), wideband code division multiple access (Wideband Code Division Multiple Access, abbreviation WCDMA), long term evolution (Long Term Evolution, abbreviation LTE), Email, short message service (Short Messaging Service, letter Claim SMS) etc..

Memory 920 can be used for storing software program and module, and processor 980 is stored in memory 920 by operation Software program and module, thereby executing the various function application and data processing of mobile phone.Memory 920 can mainly include Storing program area and storage data area, wherein storing program area can application journey needed for storage program area, at least one function Sequence (such as sound-playing function, image player function etc.) etc.；Storage data area can be stored to be created according to using for mobile phone Data (such as audio data, phone directory etc.) etc..It, can be in addition, memory 920 may include high-speed random access memory Including nonvolatile memory, for example, at least a disk memory, flush memory device or other volatile solid-states Part.

Input unit 930 can be used for receiving the number or character information of input, and generate with the user setting of mobile phone with And the related key signals input of function control.Specifically, input unit 930 may include that touch panel 931 and other inputs are set Standby 932.Touch panel 931, also referred to as touch screen, collect user on it or nearby touch operation (such as user use The operation of any suitable object or attachment such as finger, stylus on touch panel 931 or near touch panel 931), and root Corresponding attachment device is driven according to preset formula.Optionally, touch panel 931 may include touch detecting apparatus and touch Two parts of controller.Wherein, the touch orientation of touch detecting apparatus detection user, and touch operation bring signal is detected, Transmit a signal to touch controller；Touch controller receives touch information from touch detecting apparatus, and is converted into touching Point coordinate, then gives processor 980, and can receive order that processor 980 is sent and be executed.Furthermore, it is possible to using electricity The multiple types such as resistive, condenser type, infrared ray and surface acoustic wave realize touch panel 931.In addition to touch panel 931, input Unit 930 can also include other input equipments 932.Specifically, other input equipments 932 can include but is not limited to secondary or physical bond One of disk, function key (such as volume control button, switch key etc.), trace ball, mouse, operating stick etc. are a variety of.

Display unit 940 can be used for showing information input by user or be supplied to user information and mobile phone it is various Menu.Display unit 940 may include display panel 941, optionally, can use liquid crystal display (Liquid Crystal Display, abbreviation LCD), the forms such as Organic Light Emitting Diode (Organic Light-Emitting Diode, abbreviation OLED) To configure display panel 941.Further, touch panel 931 can cover display panel 941, when touch panel 931 detects After touch operation on or near it, processor 980 is sent to determine the type of touch event, is followed by subsequent processing 980 basis of device The type of touch event provides corresponding visual output on display panel 941.Although in Fig. 9, touch panel 931 and display Panel 941 is the input and input function for realizing mobile phone as two independent components, but in some embodiments it is possible to It is touch panel 931 and display panel 941 is integrated and that realizes mobile phone output and input function.

Mobile phone may also include at least one sensor 950, such as optical sensor, motion sensor and other sensors. Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can be according to ambient light Light and shade adjust the brightness of display panel 941, proximity sensor can close display panel 941 when mobile phone is moved in one's ear And/or backlight.As a kind of motion sensor, accelerometer sensor can detect (generally three axis) acceleration in all directions Size, can detect that size and the direction of gravity when static, can be used to identify the application of mobile phone posture, (for example horizontal/vertical screen is cut Change, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap) etc.；May be used also as mobile phone The other sensors such as gyroscope, barometer, hygrometer, thermometer, the infrared sensor of configuration, details are not described herein.

Voicefrequency circuit 960, loudspeaker 961, microphone 962 can provide the audio interface between user and mobile phone.Audio-frequency electric Electric signal after the audio data received conversion can be transferred to loudspeaker 961, be converted to sound by loudspeaker 961 by road 960 Signal output；On the other hand, the voice signal of collection is converted to electric signal by microphone 962, is turned after being received by voicefrequency circuit 960 It is changed to audio data, then by after the processing of audio data output processor 980, such as another mobile phone is sent to through RF circuit 910, Or audio data is exported to memory 920 to be further processed.

WiFi belongs to short range wireless transmission technology, and mobile phone can help user's transceiver electronics postal by WiFi module 970 Part, browsing webpage and access streaming video etc., it provides wireless broadband internet access for user.Although Fig. 9 is shown WiFi module 970, but it is understood that, and it is not belonging to must be configured into for mobile phone, it can according to need do not changing completely Become in the range of the essence of invention and omits.

Processor 980 is the control centre of mobile phone, using the various pieces of various interfaces and connection whole mobile phone, is led to It crosses operation or executes the software program and/or module being stored in memory 920, and call and be stored in memory 920 Data execute the various functions and processing data of mobile phone, to carry out integral monitoring to mobile phone.Optionally, processor 980 can wrap Include one or more processing units；Preferably, processor 980 can integrate application processor and modem processor, wherein answer With the main processing operation system of processor, user interface and application program etc., modem processor mainly handles wireless communication. It is understood that above-mentioned modem processor can not also be integrated into processor 980.

Mobile phone further includes the power supply 990 (such as battery) powered to all parts, it is preferred that power supply can pass through power supply pipe Reason system and processor 980 are logically contiguous, to realize management charging, electric discharge and power managed by power-supply management system Etc. functions.

Although being not shown, mobile phone can also include camera, bluetooth module etc., and details are not described herein.

The embodiment of the present application also provides a kind of computer readable storage medium, for storing program code, the program code For executing any one embodiment in a kind of method for processing video frequency described in foregoing individual embodiments.

Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through The relevant hardware of program instruction is completed, and foregoing routine can be stored in a computer readable storage medium, which exists When execution, step including the steps of the foregoing method embodiments is executed；And storage medium above-mentioned can be at least one in following media Kind: read-only memory (English: read-only memory, abbreviation: ROM), RAM, magnetic or disk etc. are various to be can store The medium of program code.

It should be noted that all the embodiments in this specification are described in a progressive manner, each embodiment it Between same and similar part may refer to each other, each embodiment focuses on the differences from other embodiments. For equipment and system embodiment, since it is substantially similar to the method embodiment, so describe fairly simple, The relevent part can refer to the partial explaination of embodiments of method.Equipment and system embodiment described above is only schematic , wherein unit may or may not be physically separated as illustrated by the separation member, it is shown as a unit Component may or may not be physical unit, it can and it is in one place, or may be distributed over multiple networks On unit.Some or all of the modules therein can be selected to achieve the purpose of the solution of this embodiment according to the actual needs. Those of ordinary skill in the art can understand and implement without creative efforts.

The above, only a kind of specific embodiment of the application, but the protection scope of the application is not limited thereto, Within the technical scope of the present application, any changes or substitutions that can be easily thought of by anyone skilled in the art, Should all it cover within the scope of protection of this application.Therefore, the protection scope of the application should be with scope of protection of the claims Subject to.

Claims

1. a kind of method for processing video frequency, which is characterized in that the described method includes:

Determine the content type of video to be processed；

By the corresponding neural network model of the content type, the view in the video to be processed as demolition node frame is determined Frequency frame；It include the corresponding characteristics of image of the content type in the demolition node frame；

2. the method according to claim 1, wherein the content type of determination video to be processed, comprising:

According to the first image neural network model, determine in the video to be processed the corresponding feature of multiple video frames to Amount, described eigenvector carry the image information for including in corresponding video frame；

3. according to the method described in claim 2, it is characterized in that, described according to the corresponding feature of the multiple video frame Vector determines the content type of the video to be processed, comprising:

According to timing of the multiple video frame in the video to be processed, by the corresponding feature of the multiple video frame Vector forms characteristic vector sequence；

According to described eigenvector sequence, the content class of the video to be processed is determined by the second image neural network model Type.

4. the method according to claim 1, wherein it is described obtain multiple video clips after, the method Further include:

Determine that the multiple video clip is belonging respectively to the weighted value of collection of choice specimens segment according to probabilistic model；

It is corresponding that the video to be processed is generated according to the video clip that weighted value in the multiple video clip meets preset condition Collection of choice specimens video.

5. according to the method described in claim 4, it is characterized in that, the probabilistic model is the probability of the corresponding content type Model, the probabilistic model is obtained according to history collection of choice specimens segment training in the content type, described according to probabilistic model Determine that the multiple video clip is belonging respectively to the weighted value of collection of choice specimens segment, comprising:

Probabilistic model image, semantic information according to entrained by the multiple video clip is taken with the history collection of choice specimens segment Similarity degree with semantic information determines that the multiple video clip is belonging respectively to the weighted value of collection of choice specimens segment.

6. method described in -5 any one according to claim 1, which is characterized in that the corresponding characteristics of image of the content type It includes any of the following:

Instructor in broadcasting's platform picture；

Advertisement picture；

The picture not being inconsistent with the content type；

Video title, run-out picture；

Target face picture.

7. a kind of video process apparatus, which is characterized in that described device includes:

Second determination unit, for determining in the video to be processed by the corresponding neural network model of the content type Video frame as demolition node frame；It include the corresponding characteristics of image of the content type in the demolition node frame；

Video demolition unit obtains multiple for carrying out video demolition to the video to be processed according to the demolition node frame Video clip.

8. device according to claim 7, which is characterized in that first determination unit is specifically used for:

9. device according to claim 8, which is characterized in that described according to the corresponding feature of the multiple video frame Vector determines the content type of the video to be processed, comprising:

10. device according to claim 7, which is characterized in that described device further include:

Third determination unit, for determining that the multiple video clip is belonging respectively to the weight of collection of choice specimens segment according to probabilistic model Value；

Collection of choice specimens video generation unit, the video clip for meeting preset condition according to weighted value in the multiple video clip are raw At the corresponding collection of choice specimens video of the video to be processed.

11. device according to claim 10, which is characterized in that the probabilistic model is the general of the corresponding content type Rate model, the probabilistic model are obtained according to history collection of choice specimens segment training in the content type, and the third determines single Member is specifically used for:

12. according to device described in claim 7-11 any one, which is characterized in that the corresponding image of the content type is special Sign includes any of the following:

Instructor in broadcasting's platform picture；

Advertisement picture；

The picture not being inconsistent with the content type；

Video title, run-out picture；

Target face picture.

13. a kind of video processing equipment, which is characterized in that the equipment includes processor and memory:

The processor is used to be handled according to the instruction execution video described in any one of claims 1-6 in said program code Method.

14. a kind of computer readable storage medium, which is characterized in that the computer readable storage medium is for storing program generation Code, said program code require method for processing video frequency described in any one of 1-6 for perform claim.