CN110166828A - A kind of method for processing video frequency and device - Google Patents
A kind of method for processing video frequency and device Download PDFInfo
- Publication number
- CN110166828A CN110166828A CN201910122357.XA CN201910122357A CN110166828A CN 110166828 A CN110166828 A CN 110166828A CN 201910122357 A CN201910122357 A CN 201910122357A CN 110166828 A CN110166828 A CN 110166828A
- Authority
- CN
- China
- Prior art keywords
- video
- processed
- content type
- demolition
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000012545 processing Methods 0.000 title claims abstract description 43
- 238000003062 neural network model Methods 0.000 claims abstract description 69
- 239000013598 vector Substances 0.000 claims description 40
- 230000008569 process Effects 0.000 claims description 13
- 238000012549 training Methods 0.000 claims description 12
- 238000010586 diagram Methods 0.000 description 17
- 230000006870 function Effects 0.000 description 11
- 230000006854 communication Effects 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 239000000203 mixture Substances 0.000 description 4
- 238000012217 deletion Methods 0.000 description 3
- 230000037430 deletion Effects 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012905 input function Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000013526 transfer learning Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000005314 correlation function Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 208000001491 myopia Diseases 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010897 surface acoustic wave method Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The embodiment of the present application discloses a kind of method for processing video frequency and device, for needing to carry out the video to be processed of video demolition, by the content type for determining the video to be processed, determine to correspond to the neural network model of the content type from neural network model, and by the neural network model determined from video to be processed include the content type corresponding image feature video frame, and using this kind of video frame as the demolition node frame of video to be processed, video demolition is carried out to video to be processed according to demolition node frame, obtains multiple video clips.Not only realize the function of carrying out video demolition automatically to video, and, to the video of different content type, demolition node frame is identified using corresponding neural network model, that is, different demolition mode, high-precision video demolition can be reached to the video of different content type, demolition efficiency is improved, the demand of current video timeliness is met.
Description
Technical field
This application involves field of video processing, more particularly to a kind of method for processing video frequency and device.
Background technique
Video demolition is a kind of video processing technique, by carrying out secondary operation to video, by a video according to certain
Logic or specific needs split into multiple video clips, and the video clip split out can be used for generating collection of choice specimens video etc..Such as
Video demolition can be carried out to traditional tv media program because of the needs of internet video and the short-sighted frequency content platform of new media,
A complete programme content originally is incited somebody to action, multiple video clips are split into.
It is many kinds of due to video, it is difficult to unified video demolition rule is set, therefore video demolition side conventional at present
Formula is manually to carry out video demolition to video to be processed by some video processing tools, how to split video clip and all relies on
Artificial experience.
Current video demolition process flow leads to low efficiency since artificial experience bring influences, it is difficult to meet current
The demand of video timeliness.
Summary of the invention
In order to solve the above-mentioned technical problem, this application provides a kind of method for processing video frequency and devices.
The embodiment of the present application discloses following technical solution:
In a first aspect, the embodiment of the present application provides a kind of method for processing video frequency, which comprises
Determine the content type of video to be processed;
By the corresponding neural network model of the content type, determine in the video to be processed as demolition node frame
Video frame;It include the corresponding characteristics of image of the content type in the demolition node frame;
Video demolition is carried out to the video to be processed according to the demolition node frame, obtains multiple video clips.
Second aspect, the embodiment of the present application provide a kind of video process apparatus, and described device includes:
First determination unit, for determining the content type of video to be processed;
Second determination unit, for determining the view to be processed by the corresponding neural network model of the content type
Video frame in frequency as demolition node frame;It include the corresponding characteristics of image of the content type in the demolition node frame;
Video demolition unit is obtained for carrying out video demolition to the video to be processed according to the demolition node frame
Multiple video clips.
The third aspect, the embodiment of the present application provide a kind of video processing equipment, and the equipment includes processor and deposits
Reservoir:
Said program code is transferred to the processor for storing program code by the memory;
The processor is used for according to the described in any item views of the above first aspect of instruction execution in said program code
Frequency processing method.
Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, the computer-readable storage
Medium is for storing program code, and said program code is for executing the described in any item video processing sides of the above first aspect
Method.
It, should be to by determining for needing to carry out the video to be processed of video demolition it can be seen from above-mentioned technical proposal
The content type for handling video, determines the neural network model for corresponding to the content type, and pass through from neural network model
The neural network model determined from video to be processed include the content type corresponding image feature video frame, and will
Demolition node frame of this kind of video frame as video to be processed carries out video demolition to video to be processed according to demolition node frame,
Obtain multiple video clips.The function of carrying out video demolition automatically to video is not only realized, also, to different content type
Video identifies demolition node frame using corresponding neural network model, that is, different demolition mode, to the view of different content type
Frequency can reach high-precision video demolition, improve demolition efficiency, meet the demand of current video timeliness.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of application without any creative labor, may be used also for those of ordinary skill in the art
To obtain other drawings based on these drawings.
Fig. 1 is a kind of exemplary scene schematic diagram provided by the embodiments of the present application;
Fig. 2 is a kind of flow diagram of method for processing video frequency provided by the embodiments of the present application;
Fig. 3 is a kind of process signal of method for the content type for determining video to be processed provided by the embodiments of the present application
Figure;
Fig. 4 is the schematic diagram of a scenario of the content type of one provided by the embodiments of the present application output video to be processed;
Fig. 5 is the schematic diagram of the effect of determining video content types provided by the embodiments of the present application;
Fig. 6 is a kind of flow diagram of method for generating collection of choice specimens video provided by the embodiments of the present application;
Fig. 7 a is a kind of structural schematic diagram of video process apparatus provided by the embodiments of the present application;
Fig. 7 b is a kind of structural schematic diagram of video process apparatus provided by the embodiments of the present application;
Fig. 8 is a kind of structural schematic diagram of video processing equipment provided by the embodiments of the present application;
Fig. 9 is a kind of structural schematic diagram of video processing equipment provided by the embodiments of the present application.
Specific embodiment
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application
Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only this
Apply for a part of the embodiment, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art exist
Every other embodiment obtained under the premise of creative work is not made, shall fall in the protection scope of this application.
The description and claims of this application and term " first ", " second ", " third ", " in above-mentioned attached drawing
The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage
The data that solution uses in this way are interchangeable under appropriate circumstances, so that embodiments herein described herein for example can be to remove
Sequence other than those of illustrating or describe herein is implemented.In addition, term " includes " and " having " and theirs is any
Deformation, it is intended that cover it is non-exclusive include, for example, containing the process, method of a series of steps or units, system, production
Product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for this
A little process, methods, the other step or units of product or equipment inherently.
As previously described, because video is many kinds of, it is difficult to unified video demolition rule be arranged, therefore traditional video is torn open
Mode is manually to carry out video demolition to video to be processed by some video processing tools, how to split video clip according to
Rely in artificial experience.Therefore, traditional video demolition process flow is influenced due to artificial experience bring, leads to low efficiency, difficult
To meet the needs of current video timeliness.
In consideration of it, the embodiment of the present application provides a kind of method for processing video frequency, the content with video to be processed can use
The corresponding neural network model of type, determined from video to be processed include the content type corresponding image feature view
Frequency frame, and the video frame of the content type corresponding image feature is determined as to the demolition node frame of video to be processed, thus
Video demolition is carried out to video to be processed using the demolition node frame, obtains multiple video clips.It can be seen that utilizing this Shen
Please embodiment provide method for processing video frequency, can automatically to video to be processed carry out video demolition, without manually by
Video processing tools carry out video demolition to video to be processed, avoid since artificial experience leads to low efficiency, it is difficult to meet mesh
The problem of demand of preceding video timeliness.
Below in conjunction with scene shown in FIG. 1, method for processing video frequency provided by the embodiments of the present application is simply introduced.
In the embodiment of the present application, video demolition can be carried out to video 101 to be processed shown in FIG. 1 automatically.The application
Embodiment does not limit the video to be processed specifically, the video to be processed for example can for news video, entertainment video,
Any one in TV play video, film video and other videos.It, can be first when carrying out video demolition to video to be processed
The demolition node frame in the video 101 to be processed is first determined, to regard according to the demolition node frame to video 101 to be processed
Frequency demolition.The demolition node frame referred in the embodiment of the present application is the cut-point that video demolition is carried out to video 101 to be processed.
In the embodiment of the present application, the demolition node frame may include a frame video frame, also may include the continuous video frame of multiframe,
The embodiment of the present application does not limit specifically.In the embodiment of the present application, video is carried out to video 101 to be processed according to demolition node frame
Demolition, such as can be and the video frame between two adjacent demolition node frames is determined as a video clip, for another example it can be
Video frame before some demolition node frame is determined as a video clip, or will be after some demolition node frame
Video frame is determined as a video clip, is can be for another example by the video frame deletion before some demolition node frame, or will
Video frame deletion, etc. after some demolition node frame.In scene shown in Fig. 1, neural network model can use
102 determine the video frame 103 in video 101 to be processed as demolition node frame.It is determined as the video frame 103 of demolition node frame
Later, then it can use the video frame 103 as demolition node frame and video demolition carried out to video 101 to be processed, obtain
Multiple video clips 104.
For example, the content type of video 101 to be processed is TV play, video 101 to be processed includes that 100 frames regard altogether
Frequency frame is ranked up after arriving first, respectively video according to the sequence that the video frame occurs in the video 101 to be processed
Frame 1, video frame 2, video frame 3 ... video frame 100, wherein 20th frame to being and TV play between the 75th frame in video to be processed
Unrelated advertisement picture.The video frame that neural network model 102 corresponding with TV play is determined as demolition node frame is video
Frame 20 and video frame 75 then carry out video demolition to video 101 to be processed using demolition node frame, obtain multiple video clips,
This multiple video clip is respectively the video clip for including multiple video frames.Wherein, first video clip be by video frame 1 to
The video clip that video frame 19 is constituted, second video clip are the piece of video that is made of video frame 76 to video frame 100
Section, and the video clip that video frame 20 to video frame 75 is constituted is deleted, and realizes video 101 to be processed splitting into two
Video clip.It is that video to be processed 101 above is through available 3 video clips of demolition in embodiment another, first
Video clip is the video clip being made of video frame 1 to video frame 19, and second video clip is by video frame 76 to view
The video clip that frequency frame 100 is constituted, third video clip are the video clip that is made of video frame 20 to video frame 75
And it will not be deleted.
It should be noted that consider the video of different content type, the corresponding video frame of corresponding demolition node frame
Corresponding characteristics of image is different.Therefore, the neural network model 102 in the embodiment of the present application is and the video to be processed
The corresponding neural network model of 101 content type, the neural network model 102 can be determined to carry the video to be processed
Content type corresponding to characteristics of image, and the corresponding video frame of described image feature will be carried, be determined as described to be processed
The demolition node frame of video 101.
The content type of the video to be processed referred in the embodiment of the present application refers to the content institute of the video to be processed
The classification of category.The embodiment of the present application does not limit the content type of the video to be processed specifically, as an example, described wait locate
The content type for managing video, can be news political situation of the time class, entertainment class, TV play film class, automotive-type, animal class, mother and baby
Any one in class, sport category and other classes etc..
It should be noted that the embodiment of the present application does not limit the neural network model 102, the neural network mould specifically
Type 102 for example can be convolutional neural networks (Convolutional Neural Networks, CNN) model.
Method for processing video frequency provided by the present application is introduced below by way of specific embodiment.
Referring to fig. 2, which is a kind of flow diagram of method for processing video frequency provided by the embodiments of the present application.
Method for processing video frequency provided by the embodiments of the present application, such as can be realized with S201-S203 as follows.
S201: the content type of video to be processed is determined.
It, can be with reference to phase above inside the Pass about the description section of the video to be processed and the content type of video to be processed
The description of appearance, details are not described herein again.
It is understood that the video of different content type, the spy of image information entrained by the video frame for being included
Point may be different, and the image information of certain features may be carried in the video frame that the video of certain types of content is included.Cause
This can be to image entrained by video frame included by video to be processed in a kind of implementation of the embodiment of the present application
Information is analyzed, with the content type of the determination video to be processed.In another implementation of the embodiment of the present application,
Video to be processed can be watched by full-time worker and determine the content class of video to be processed by the way of manually determining
Type.In another implementation of the embodiment of the present application, video to be processed can be determined by the label of the video to be processed
Content type.Wherein, the label of video to be processed can be what user marked in advance.For example, video to be processed is on user
The video of biography, user upload the content type that the video timestamp to be processed has infused video to be processed.
S202: it by the corresponding neural network model of the content type, determines in the video to be processed as demolition
The video frame of node frame;It include the corresponding characteristics of image of the content type in the demolition node frame.
In the embodiment of the present application, it can use neural network model to determine in video to be processed as demolition node frame
Video frame.In view of the video of different content type, image corresponding to the corresponding video frame of corresponding demolition node frame is special
Sign is different, to be difficult with the demolition node frame that a neural network model determines the video of various content types.Therefore, exist
In the embodiment of the present application, the neural network model of the content type corresponding to the video to be processed can be used, determine wait locate
Manage the video frame in video as demolition node frame.Exactly because the neural network model corresponds to the video to be processed
Content type, therefore, in the demolition node frame determined using the neural network model include that the content type is corresponding
Characteristics of image.
It should be noted that the embodiment of the present application does not limit the corresponding characteristics of image of the content type specifically, such as above
Described, the content type of the video to be processed can be news political situation of the time class, entertainment class, TV play film class and other
Any one in class etc..Corresponding, the corresponding characteristics of image of the content type also may include that news political situation of the time class is corresponding
Characteristics of image, the corresponding characteristics of image of entertainment class, the corresponding characteristics of image of TV play film class and other classes it is corresponding
Characteristics of image in any one.
It is understood that therefore, working as institute in general, will appear instructor in broadcasting's platform picture mostly in news political situation of the time class video
When the content type for stating video to be processed is news political situation of the time class, the corresponding characteristics of image of the content type can be drawn for instructor in broadcasting's platform
Face.It is possible that some advertisement pictures and some with entertainment type mismatch picture in entertainment class video, therefore,
When the content type of the video to be processed is entertainment class, the corresponding characteristics of image of the content type can be advertisement
Picture or the picture not being inconsistent with the content type.It will appear video title, piece mostly in general TV play movie video
Tail picture, therefore, when the content type of the video to be processed is TV play film class, the corresponding image of the content type
Feature can be video title, run-out picture.And for can generally have people in other kinds of video such as sport category video
Face, therefore, when the content type of the video to be processed is other classes, the corresponding characteristics of image of the content type can be
Target face picture.That is, in the embodiment of the present application, the corresponding image of the content type of the video to be processed is special
Sign, may include instructor in broadcasting's platform picture, advertisement picture, the picture not being inconsistent with the content type, video title run-out picture and
Any one in target face picture.
S203: video demolition is carried out to the video to be processed according to the demolition node frame, obtains multiple video clips.
After determining the demolition node frame, it can be torn open using the demolition node frame as to video to be processed progress video
The video to be processed is carried out video demolition, obtains multiple video clips by the cut-point of item.
For example, news political situation of the time class video to be processed can use the corresponding neural network model of news political situation of the time class and be known
Not Chu instructor in broadcasting's platform, using instructor in broadcasting's platform picture as demolition node frame, and identify the master in video to be processed in conjunction with face recognition algorithms
Hold people, in conjunction with ASR/OCR recognition result, using TextRank to the voice and word content in video to be processed at
Reason, obtains video frequency abstract, generates title and abstract according to the video clip that video frequency abstract is obtained to video demolition automatically.Wherein,
ASR refers to that the vocabulary Content Transformation in the voice by the mankind is the technology of text;OCR refers to using GMM-HMM/DNN technology, incites somebody to action
The technology that word content on picture, photo is identified and extracted.
For another example, video to be processed for entertainment class can delete advertisement and other and joy in video to be processed
The content of music program class video is not consistent content specifically can be in conjunction with the aforementioned advertisement of OCR and ASR associative key deletion
And other contents with entertainment class video are not consistent content.
For another example, for TV play film class video to be processed, demolition can be determined in conjunction with teaser or tail and target face
Node frame accurately determines the time point that head, run-out and target face occur, and deletes head and run-out, and generate
Video clip comprising target face can accomplish the video capability for only seeing or not seeing target face in TV play and film.
For another example, video to be processed for other classes, can be according to user demand, in conjunction with recognition of face, OCR, ASR and target
The technologies such as detection technique do relevant treatment, obtain the video clip for meeting user demand.
As can be seen from the above description, using method for processing video frequency provided by the embodiments of the present application, for needing to carry out video
The video to be processed of demolition determines corresponding be somebody's turn to do by determining the content type of the video to be processed from neural network model
The neural network model of content type, and determine to include the content class from video to be processed by the neural network model
The video frame of type corresponding image feature, and using this kind of video frame as the demolition node frame of video to be processed, according to demolition section
Point frame carries out video demolition to video to be processed, obtains multiple video clips.It not only realizes and carries out video automatically to video and tear open
The function of item, also, to the video of different content type, it is identified using corresponding neural network model, that is, different demolition mode
Demolition node frame can reach high-precision video demolition to the video of different content type, improve demolition efficiency, meet
The demand of video timeliness at present.
As described in S201 above, in the embodiment of the present application, video frame included by video to be processed can be taken
The image information of band is analyzed, with the content type of the determination video to be processed.A kind of determination is introduced below in conjunction with attached drawing
The implementation of the content type of video to be processed.
Referring to Fig. 3, which is a kind of method of content type for determining video to be processed provided by the embodiments of the present application
Flow diagram.
The method of the content type of determination video to be processed provided by the embodiments of the present application, can be as follows
S301-S302 is realized.
S301: according to the first image neural network model, determine that multiple video frames respectively correspond in the video to be processed
Feature vector, described eigenvector carries the image information for including in corresponding video frame.
S302: the content class of the video to be processed is determined according to the corresponding feature vector of the multiple video frame
Type.
It should be noted that the first image neural network model referred in the embodiment of the present application, can be convolutional Neural
Network model.The first image neural network model in the embodiment of the present application can be the image generated using multitude of video frame
It is obtained as training sample training.The first image neural network model, can extract the video frame pair in video to be processed
The feature vector answered.Wherein, the corresponding feature vector of a video frame can carry image letter included in the video frame
Breath.That is, the first image neural network model, can determine the image letter for including in the video frame in video to be processed
Breath.
In the embodiment of the present application, the picture input described that video frames multiple in the video to be processed can be generated
One image neural network model, by the place of the convolutional layer of the first image neural network model, filter layer and pond layer
Reason, exports the corresponding feature vector of the multiple video frame.
It, to a certain extent can be with it is understood that image information included in video frame in video to be processed
The image type of the video to be processed is embodied, therefore, can use the corresponding spy of multiple video frames in video to be processed
Vector is levied, determines the content type of video to be processed.
In the embodiment of the present application, it is contemplated that image information included by a video frame in video to be processed, it may
It is not enough to embody the content type of the video to be processed, that is to say, that the corresponding spy of a video frame in video to be processed
Levy vector, it may not be possible to determine the content type of video to be processed.Therefore, in the embodiment of the present application, can use described
First image neural network model determines the corresponding feature vector of multiple video frames in video to be processed, and described in utilizing
The corresponding feature vector of multiple video frames determines the content type of the video to be processed.In other words, utilization is the multiple
Included image information, determines the content type of the video to be processed in video frame.
In practical applications, it is however generally that, utilize image information included in all videos frame in video to be processed, foot
To determine the content type of the video to be processed, therefore, the multiple video frames referred in the embodiment of the present application may include
All videos frame in the video to be processed.In practical applications, using included in part video frame in video to be processed
Image information, may also be enough to determine therefore the content type of the video to be processed refers in the embodiment of the present application
Multiple video frames may include the partial video frame in the video to be processed.
S302 in specific implementation, such as can analyze the corresponding feature vector of the multiple video frame,
So that it is determined that the content type of the video to be processed.
In the embodiment of the present application, it is contemplated that the relevance in the video to be processed, between continuous multiple frames video frame
May be bigger, in other words, the correlation in continuous multiple frames video frame between included picture material may also be relatively high.Cause
This, can if the correlation between continuous multiple frames video frame can be taken into account when determining the content type of video to be processed
The content type of video to be processed is enough determined more accurately out.
In consideration of it, S302 in specific implementation, can pass through in a kind of possible implementation of the embodiment of the present application
Following steps A-B is realized.
Step A: according to timing of the multiple video frame in the video to be processed, the multiple video frame is distinguished
Corresponding feature vector composition characteristic sequence vector.
Step B: according to described eigenvector sequence, the video to be processed is determined by the second image neural network model
Content type.
About step A, it should be noted that in order to enable can will connect when determining the content type of video to be processed
Correlation between continuous multi-frame video frame takes into account, in the embodiment of the present application, according to the multiple video frame it is described to
The timing in video is handled, by the corresponding feature vector composition characteristic sequence vector of the multiple video frame, so,
It can be by the correlation between continuous multiple feature vectors in analysis described eigenvector sequence, by the continuous multiple frames video
Correlation between frame takes into account.
In the embodiment of the present application, timing of the multiple video frame in the video to be processed is referred to the multiple
The sequencing that video frame occurs in the video to be processed.In the embodiment of the present application, include in institute's characteristic vector sequence
Multiple feature vectors, the multiple feature vector and the multiple video frame correspond, multiple in described eigenvector sequence
The sequence of feature vector, sequence phase of the video frame corresponding with the multiple feature vector in the video to be processed
Together.For example, video frame 1 is to the corresponding feature vector 1 of video frame 100 to feature vector 100, video frame 1 is described
The time occurred in video to be processed is earliest, and the time that video frame 2 occurs in the video to be processed takes second place, video frame 100
The time of appearance is the latest.Then the sequence of 100 feature vectors in characteristic vector sequence is feature vector 1, feature vector 2 ...
Feature vector 100.
In the embodiment of the present application, after obtaining described eigenvector sequence, the second image neural network mould can be passed through
Type determines the content type of the video to be processed.
The embodiment of the present application does not limit specifically and searches book the second image neural network model, it is contemplated that Recognition with Recurrent Neural Network
The input of (Recurrent Neural Network, RNN) is characteristic sequence, and can by multiple features in characteristic sequence into
Row association, in consideration of it, in the embodiment of the present application, the second image neural network model may include Recognition with Recurrent Neural Network mould
Type.
The embodiment of the present application does not limit the content class that the second image neural network model determines the video to be processed specifically
The specific implementation of type, as an example, the second image neural network model can use multilayer perceptron
(Multi-Layer Perceptron, MLP) handles described eigenvector sequence, exports the interior of the video to be processed
Hold type.
In the embodiment of the present application, the second image neural network model exports the content type of video to be processed, real
It is the probability that the output video to be processed belongs to each content type on border.Reference can be made to Fig. 4 is understood, Fig. 4 is the application
The schematic diagram of a scenario of the content type for the output video to be processed that embodiment provides.By 402 in Fig. 4 it is found that wait locate
The content type of reason video 401 is that the probability of " automobile " is 0.99959, is 0.00009, is " dynamic for the probability of " amusement fashion "
The probability of object ", which is 0.00008, is the probability of " mother and baby " is 0.00007 and be the probability of " sport " is 0.00006.
It should be noted that, although the content type of video shown in Fig. 4 includes automobile, amusement fashion, animal, mother and baby
And sport, but this is signal type explanation, does not constitute the restriction to the embodiment of the present application.
By actual tests data it is found that determining the content type of video to be processed using the first image neural network model
When, if video to be processed is the videos such as the obvious video, such as game, football, basketball, animation of pictorial feature, then correspond to
Accuracy rate 99% or more.View to be processed is determined using the first image neural network model and the second image neural network model
The content type of frequency, when video to be processed is the more dispersed video of pictorial feature, such as TV play, outdoor sports, cuisines, trip
Whens trip etc., corresponding accuracy rate is 85% or so.Specifically, it may refer to Fig. 5 to be understood, Fig. 5 mentions for the embodiment of the present application
The schematic diagram of the effect of the determination video content types of confession.
In Fig. 5,501 be to be determined in video using the first image neural network model and the second image neural network model
Hold type accuracy rate curve graph, 502 for using the first image neural network model determine video content type it is accurate
The curve graph of rate.Abscissa in Fig. 5 indicates that obtaining the first image neural network model using transfer learning training corresponds to
Frequency of training, also illustrate that training obtain by the first image neural network model and the second image neural network model
The corresponding frequency of training of image neural network model of composition, ordinate indicates accuracy rate, wherein the training of each transfer learning
Sample includes several pictures such as 10240 pictures.As it can be seen that either utilizing the first image neural network model and the second figure
As neural network model determines the content type of video, or the content class of video is determined using the first image neural network model
Type, accuracy rate are relatively high.
In practical applications, after a video being carried out video demolition, obtaining multiple video clips can be used for generating
Collection of choice specimens video.Currently, video clip can be generated collection of choice specimens video by the way of manually participating in.It specifically, can be by working
Personnel watch each video clip, judge whether the video clip belongs to collection of choice specimens segment, and collection of choice specimens segment is synthesized collection of choice specimens view
Frequently.But this synthesis collection of choice specimens video mode, the judgement of one side collection of choice specimens segment depend on the subjective judgement shadow of staff
It rings, the judging result of collection of choice specimens segment may inaccuracy.On the other hand, need staff that the sight of each video clip is read through it
Afterwards, collection of choice specimens video could be generated, efficiency is relatively low.
In the embodiment of the present application, the video to be processed is subjected to video demolition, after obtaining multiple video clips, also
It can judge automatically whether video clip belongs to collection of choice specimens segment, and generate collection of choice specimens video.The application reality is introduced below in conjunction with attached drawing
The method that the generation collection of choice specimens video of example offer is provided.
Referring to Fig. 6, which is a kind of flow diagram of method for generating collection of choice specimens video provided by the embodiments of the present application.
The method provided by the embodiments of the present application for generating collection of choice specimens video, such as can be with the reality of S601-S602 as follows
It is existing.
S601: determine that the multiple video clip is belonging respectively to the weighted value of collection of choice specimens segment according to probabilistic model.
In the embodiment of the present application, the weighted value of a video clip belongs to collection anthology film for characterizing the video clip
A possibility that section, belongs to the probability of collection collection of choice specimens segment for characterizing the video clip in other words.
The probabilistic model of the embodiment of the present application can determine that video clip belongs to the weighted value of collection of choice specimens segment.The application is real
It applies example and does not limit the probabilistic model specifically, the core algorithm of the probabilistic model for example can be maximum likelihood algorithm.At this
Apply in embodiment, it is contemplated that the video of different content type, image, semantic information entrained by corresponding collection of choice specimens segment may
Difference, so that being difficult with a general probabilistic model determines whether the video clip of various content types belongs to anthology film
Section.Therefore, in a kind of possible implementation of the embodiment of the present application, the probabilistic model can be described to be processed for correspondence
The probabilistic model of the content type of video.In the embodiment of the present application, the probabilistic model can be based on the content type
What middle history collection of choice specimens segment training obtained, therefore, which can determine that the multiple video clip belongs to anthology film
The weight of section.
It should be noted that in the embodiment of the present application, image, semantic information entrained by a video clip refers to
The meaning of content expressed by image in the video clip.For example, video clip is the piece in Basketball Match into three-pointer
Section, then the image, semantic information that the video clip carries are Three-point Ball in Basketball.
S602: it is generated according to the video clip that weighted value in the multiple video clip meets preset condition described to be processed
The corresponding collection of choice specimens video of video.
As described in S601, the weighted value of a video clip belongs to collection collection of choice specimens segment for characterizing the video clip
Possibility.Therefore, weighted value meets the video clip of preset condition in the multiple video clip, such as can be the multiple
Weighted value is greater than the video clip of preset threshold in video clip, and the embodiment of the present application does not limit the preset threshold, institute specifically
Stating preset threshold can be determines according to actual conditions.For example, the preset threshold can be 0.80.
It, can be by the weight after determining that weighted value meets the video clip of preset condition in the multiple video clip
The video clip that value meets preset condition synthesizes collection of choice specimens video.
In the embodiment of the present application, it is contemplated that the corresponding collection of choice specimens segment of the video of same content type, the image carried
Semantic information may be more similar, for example, can be Three-point Ball in Basketball, football for the collection of choice specimens segment of sport category video
The segments such as goal.Therefore, in a kind of implementation of the embodiment of the present application, if probabilistic model described in S601 is corresponding institute
The probabilistic model of content type is stated, and the probabilistic model is obtained according to history collection of choice specimens segment training in the content type
, then S602 in specific implementation, probabilistic model image, semantic information according to entrained by the multiple video clip with
The similarity degree of semantic information entrained by the history collection of choice specimens segment determines that the multiple video clip is belonging respectively to collection of choice specimens segment
Weighted value.
Specifically, the probabilistic model can extract image, semantic information entrained by the multiple video clip, and with
Semantic information entrained by history collection of choice specimens segment is compared, determine image, semantic information entrained by the multiple video clip with
The similarity degree of semantic information entrained by the history collection of choice specimens segment.In the embodiment of the present application, entrained by the video clip
Image, semantic information and the similarity degree of semantic information entrained by history collection of choice specimens segment it is higher, then the video clip belongs to collection
The weighted value of bright and beautiful segment is higher, correspondingly, image, semantic information entrained by the video clip is taken with history collection of choice specimens segment
The similarity degree of the semantic information of band is lower, then the video clip belong to collection of choice specimens segment weighted value it is lower.
It can be seen that can use probabilistic model using method provided by the embodiments of the present application and automatically determine out piece of video
Whether section belongs to collection of choice specimens segment, and the determination result of collection of choice specimens segment is accurate and high-efficient, the corresponding efficiency for generating collection of choice specimens video
It is relatively high.Solve judging result inaccuracy and low efficiency that traditional artificial mode generates collection of choice specimens segment existing for collection of choice specimens video
Problem.
Based on a kind of method for processing video frequency that previous embodiment provides, the present embodiment provides a kind of video process apparatus 700,
Referring to Fig. 7 a, which is a kind of structural schematic diagram of video process apparatus provided by the embodiments of the present application.Described device 700 includes
First determination unit 701, the second determination unit 702 and video demolition unit 703.
First determination unit 701, for determining the content type of video to be processed;
Second determination unit 702, for determining described to be processed by the corresponding neural network model of the content type
Video frame in video as demolition node frame;It include the corresponding characteristics of image of the content type in the demolition node frame;
Video demolition unit 703 is obtained for carrying out video demolition to the video to be processed according to the demolition node frame
To multiple video clips.
Optionally, first determination unit 701, is specifically used for:
According to the first image neural network model, the corresponding feature of multiple video frames in the video to be processed is determined
Vector, described eigenvector carry the image information for including in corresponding video frame;
The content type of the video to be processed is determined according to the corresponding feature vector of the multiple video frame.
Optionally, described to be determined in the video to be processed according to the corresponding feature vector of the multiple video frame
Hold type, comprising:
It is according to timing of the multiple video frame in the video to be processed, the multiple video frame is corresponding
Feature vector composition characteristic sequence vector;
According to described eigenvector sequence, the content of the video to be processed is determined by the second image neural network model
Type.
Optionally, referring to Fig. 7 b, which is the structural representation of another video process apparatus provided by the embodiments of the present application
Figure.Described device 700 further include: third determination unit 704 and collection of choice specimens video generation unit 705.
Third determination unit 704, for determining that the multiple video clip is belonging respectively to collection of choice specimens segment according to probabilistic model
Weighted value;
Collection of choice specimens video generation unit 705, for meeting the view of preset condition according to weighted value in the multiple video clip
Frequency segment generates the corresponding collection of choice specimens video of the video to be processed.
Optionally, the probabilistic model is the probabilistic model of the corresponding content type, and the probabilistic model is according to institute
State what history collection of choice specimens segment training in content type obtained, the third determination unit 704 is specifically used for:
Probabilistic model image, semantic information according to entrained by the multiple video clip and the history collection of choice specimens segment
The similarity degree of entrained semantic information determines that the multiple video clip is belonging respectively to the weighted value of collection of choice specimens segment.
Optionally, the corresponding characteristics of image of the content type includes any of the following:
Instructor in broadcasting's platform picture;
Advertisement picture;
The picture not being inconsistent with the content type;
Video title, run-out picture;
Target face picture.
As can be seen from the above description, using video process apparatus provided by the embodiments of the present application, for needing to carry out video
The video to be processed of demolition determines corresponding be somebody's turn to do by determining the content type of the video to be processed from neural network model
The neural network model of content type, and determine to include the content class from video to be processed by the neural network model
The video frame of type corresponding image feature, and using this kind of video frame as the demolition node frame of video to be processed, according to demolition section
Point frame carries out video demolition to video to be processed, obtains multiple video clips.It not only realizes and carries out video automatically to video and tear open
The function of item, also, to the video of different content type, it is identified using corresponding neural network model, that is, different demolition mode
Demolition node frame can reach high-precision video demolition to the video of different content type, improve demolition efficiency, meet
The demand of video timeliness at present.
The embodiment of the present application also provides a kind of video processing equipments, are situated between with reference to the accompanying drawing to video processing equipment
It continues.Shown in Figure 8, the embodiment of the present application provides a kind of video processing equipment 800, which can be server,
Bigger difference can be generated because configuration or performance are different, may include one or more central processing units (Central
Processing Units, abbreviation CPU) 822 (for example, one or more processors) and memory 832, one or one
The storage medium 830 (such as one or more mass memory units) of application program 842 or data 844 stored above.Its
In, memory 832 and storage medium 830 can be of short duration storage or persistent storage.The program for being stored in storage medium 830 can be with
Including one or more modules (diagram does not mark), each module may include to the series of instructions behaviour in server
Make.Further, central processing unit 822 can be set to communicate with storage medium 830, executes and deposits in video processing 800
Series of instructions operation in storage media 830.
Video processing equipment 800 can also include one or more power supplys 826, one or more wired or nothings
Wired network interface 850, one or more input/output interfaces 858, and/or, one or more operating systems 841,
Such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
The step as performed by server can be based on the server architecture shown in Fig. 8 in above-described embodiment.
Wherein, CPU 822 is for executing following steps:
Determine the content type of video to be processed;
By the corresponding neural network model of the content type, determine in the video to be processed as demolition node frame
Video frame;It include the corresponding characteristics of image of the content type in the demolition node frame;
Video demolition is carried out to the video to be processed according to the demolition node frame, obtains multiple video clips.
Shown in Figure 9, the embodiment of the present application provides a kind of video processing equipment 900, which can also be
Terminal device, the terminal device can be include mobile phone, tablet computer, personal digital assistant (PersonalDigital
Assistant, abbreviation PDA), point-of-sale terminal (Point of Sales, abbreviation POS), any terminal device such as vehicle-mounted computer, with
Terminal device is for mobile phone:
Fig. 9 shows the block diagram of the part-structure of mobile phone relevant to terminal device provided by the embodiments of the present application.Ginseng
Fig. 9 is examined, mobile phone includes: radio frequency (Radio Frequency, abbreviation RF) circuit 910, memory 920, input unit 930, display
Unit 940, sensor 950, voicefrequency circuit 960, Wireless Fidelity (wireless fidelity, abbreviation WiFi) module 970, place
Manage the components such as device 980 and power supply.It will be understood by those skilled in the art that handset structure shown in Fig. 9 does not constitute opponent
The restriction of machine may include perhaps combining certain components or different component layouts than illustrating more or fewer components.
It is specifically introduced below with reference to each component parts of the Fig. 9 to mobile phone:
RF circuit 910 can be used for receiving and sending messages or communication process in, signal sends and receivees, particularly, by base station
After downlink information receives, handled to processor 980;In addition, the data for designing uplink are sent to base station.In general, RF circuit 910
Including but not limited to antenna, at least one amplifier, transceiver, coupler, low-noise amplifier (Low Noise
Amplifier, abbreviation LNA), duplexer etc..In addition, RF circuit 910 can also by wireless communication with network and other equipment
Communication.Any communication standard or agreement, including but not limited to global system for mobile communications can be used in above-mentioned wireless communication
(Global System of Mobile communication, abbreviation GSM), general packet radio service (General
Packet Radio Service, abbreviation GPRS), CDMA (Code Division Multiple Access, referred to as
CDMA), wideband code division multiple access (Wideband Code Division Multiple Access, abbreviation WCDMA), long term evolution
(Long Term Evolution, abbreviation LTE), Email, short message service (Short Messaging Service, letter
Claim SMS) etc..
Memory 920 can be used for storing software program and module, and processor 980 is stored in memory 920 by operation
Software program and module, thereby executing the various function application and data processing of mobile phone.Memory 920 can mainly include
Storing program area and storage data area, wherein storing program area can application journey needed for storage program area, at least one function
Sequence (such as sound-playing function, image player function etc.) etc.;Storage data area can be stored to be created according to using for mobile phone
Data (such as audio data, phone directory etc.) etc..It, can be in addition, memory 920 may include high-speed random access memory
Including nonvolatile memory, for example, at least a disk memory, flush memory device or other volatile solid-states
Part.
Input unit 930 can be used for receiving the number or character information of input, and generate with the user setting of mobile phone with
And the related key signals input of function control.Specifically, input unit 930 may include that touch panel 931 and other inputs are set
Standby 932.Touch panel 931, also referred to as touch screen, collect user on it or nearby touch operation (such as user use
The operation of any suitable object or attachment such as finger, stylus on touch panel 931 or near touch panel 931), and root
Corresponding attachment device is driven according to preset formula.Optionally, touch panel 931 may include touch detecting apparatus and touch
Two parts of controller.Wherein, the touch orientation of touch detecting apparatus detection user, and touch operation bring signal is detected,
Transmit a signal to touch controller;Touch controller receives touch information from touch detecting apparatus, and is converted into touching
Point coordinate, then gives processor 980, and can receive order that processor 980 is sent and be executed.Furthermore, it is possible to using electricity
The multiple types such as resistive, condenser type, infrared ray and surface acoustic wave realize touch panel 931.In addition to touch panel 931, input
Unit 930 can also include other input equipments 932.Specifically, other input equipments 932 can include but is not limited to secondary or physical bond
One of disk, function key (such as volume control button, switch key etc.), trace ball, mouse, operating stick etc. are a variety of.
Display unit 940 can be used for showing information input by user or be supplied to user information and mobile phone it is various
Menu.Display unit 940 may include display panel 941, optionally, can use liquid crystal display (Liquid Crystal
Display, abbreviation LCD), the forms such as Organic Light Emitting Diode (Organic Light-Emitting Diode, abbreviation OLED)
To configure display panel 941.Further, touch panel 931 can cover display panel 941, when touch panel 931 detects
After touch operation on or near it, processor 980 is sent to determine the type of touch event, is followed by subsequent processing 980 basis of device
The type of touch event provides corresponding visual output on display panel 941.Although in Fig. 9, touch panel 931 and display
Panel 941 is the input and input function for realizing mobile phone as two independent components, but in some embodiments it is possible to
It is touch panel 931 and display panel 941 is integrated and that realizes mobile phone output and input function.
Mobile phone may also include at least one sensor 950, such as optical sensor, motion sensor and other sensors.
Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can be according to ambient light
Light and shade adjust the brightness of display panel 941, proximity sensor can close display panel 941 when mobile phone is moved in one's ear
And/or backlight.As a kind of motion sensor, accelerometer sensor can detect (generally three axis) acceleration in all directions
Size, can detect that size and the direction of gravity when static, can be used to identify the application of mobile phone posture, (for example horizontal/vertical screen is cut
Change, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap) etc.;May be used also as mobile phone
The other sensors such as gyroscope, barometer, hygrometer, thermometer, the infrared sensor of configuration, details are not described herein.
Voicefrequency circuit 960, loudspeaker 961, microphone 962 can provide the audio interface between user and mobile phone.Audio-frequency electric
Electric signal after the audio data received conversion can be transferred to loudspeaker 961, be converted to sound by loudspeaker 961 by road 960
Signal output;On the other hand, the voice signal of collection is converted to electric signal by microphone 962, is turned after being received by voicefrequency circuit 960
It is changed to audio data, then by after the processing of audio data output processor 980, such as another mobile phone is sent to through RF circuit 910,
Or audio data is exported to memory 920 to be further processed.
WiFi belongs to short range wireless transmission technology, and mobile phone can help user's transceiver electronics postal by WiFi module 970
Part, browsing webpage and access streaming video etc., it provides wireless broadband internet access for user.Although Fig. 9 is shown
WiFi module 970, but it is understood that, and it is not belonging to must be configured into for mobile phone, it can according to need do not changing completely
Become in the range of the essence of invention and omits.
Processor 980 is the control centre of mobile phone, using the various pieces of various interfaces and connection whole mobile phone, is led to
It crosses operation or executes the software program and/or module being stored in memory 920, and call and be stored in memory 920
Data execute the various functions and processing data of mobile phone, to carry out integral monitoring to mobile phone.Optionally, processor 980 can wrap
Include one or more processing units;Preferably, processor 980 can integrate application processor and modem processor, wherein answer
With the main processing operation system of processor, user interface and application program etc., modem processor mainly handles wireless communication.
It is understood that above-mentioned modem processor can not also be integrated into processor 980.
Mobile phone further includes the power supply 990 (such as battery) powered to all parts, it is preferred that power supply can pass through power supply pipe
Reason system and processor 980 are logically contiguous, to realize management charging, electric discharge and power managed by power-supply management system
Etc. functions.
Although being not shown, mobile phone can also include camera, bluetooth module etc., and details are not described herein.
The embodiment of the present application also provides a kind of computer readable storage medium, for storing program code, the program code
For executing any one embodiment in a kind of method for processing video frequency described in foregoing individual embodiments.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through
The relevant hardware of program instruction is completed, and foregoing routine can be stored in a computer readable storage medium, which exists
When execution, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned can be at least one in following media
Kind: read-only memory (English: read-only memory, abbreviation: ROM), RAM, magnetic or disk etc. are various to be can store
The medium of program code.
It should be noted that all the embodiments in this specification are described in a progressive manner, each embodiment it
Between same and similar part may refer to each other, each embodiment focuses on the differences from other embodiments.
For equipment and system embodiment, since it is substantially similar to the method embodiment, so describe fairly simple,
The relevent part can refer to the partial explaination of embodiments of method.Equipment and system embodiment described above is only schematic
, wherein unit may or may not be physically separated as illustrated by the separation member, it is shown as a unit
Component may or may not be physical unit, it can and it is in one place, or may be distributed over multiple networks
On unit.Some or all of the modules therein can be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
Those of ordinary skill in the art can understand and implement without creative efforts.
The above, only a kind of specific embodiment of the application, but the protection scope of the application is not limited thereto,
Within the technical scope of the present application, any changes or substitutions that can be easily thought of by anyone skilled in the art,
Should all it cover within the scope of protection of this application.Therefore, the protection scope of the application should be with scope of protection of the claims
Subject to.
Claims (14)
1. a kind of method for processing video frequency, which is characterized in that the described method includes:
Determine the content type of video to be processed;
By the corresponding neural network model of the content type, the view in the video to be processed as demolition node frame is determined
Frequency frame;It include the corresponding characteristics of image of the content type in the demolition node frame;
Video demolition is carried out to the video to be processed according to the demolition node frame, obtains multiple video clips.
2. the method according to claim 1, wherein the content type of determination video to be processed, comprising:
According to the first image neural network model, determine in the video to be processed the corresponding feature of multiple video frames to
Amount, described eigenvector carry the image information for including in corresponding video frame;
The content type of the video to be processed is determined according to the corresponding feature vector of the multiple video frame.
3. according to the method described in claim 2, it is characterized in that, described according to the corresponding feature of the multiple video frame
Vector determines the content type of the video to be processed, comprising:
According to timing of the multiple video frame in the video to be processed, by the corresponding feature of the multiple video frame
Vector forms characteristic vector sequence;
According to described eigenvector sequence, the content class of the video to be processed is determined by the second image neural network model
Type.
4. the method according to claim 1, wherein it is described obtain multiple video clips after, the method
Further include:
Determine that the multiple video clip is belonging respectively to the weighted value of collection of choice specimens segment according to probabilistic model;
It is corresponding that the video to be processed is generated according to the video clip that weighted value in the multiple video clip meets preset condition
Collection of choice specimens video.
5. according to the method described in claim 4, it is characterized in that, the probabilistic model is the probability of the corresponding content type
Model, the probabilistic model is obtained according to history collection of choice specimens segment training in the content type, described according to probabilistic model
Determine that the multiple video clip is belonging respectively to the weighted value of collection of choice specimens segment, comprising:
Probabilistic model image, semantic information according to entrained by the multiple video clip is taken with the history collection of choice specimens segment
Similarity degree with semantic information determines that the multiple video clip is belonging respectively to the weighted value of collection of choice specimens segment.
6. method described in -5 any one according to claim 1, which is characterized in that the corresponding characteristics of image of the content type
It includes any of the following:
Instructor in broadcasting's platform picture;
Advertisement picture;
The picture not being inconsistent with the content type;
Video title, run-out picture;
Target face picture.
7. a kind of video process apparatus, which is characterized in that described device includes:
First determination unit, for determining the content type of video to be processed;
Second determination unit, for determining in the video to be processed by the corresponding neural network model of the content type
Video frame as demolition node frame;It include the corresponding characteristics of image of the content type in the demolition node frame;
Video demolition unit obtains multiple for carrying out video demolition to the video to be processed according to the demolition node frame
Video clip.
8. device according to claim 7, which is characterized in that first determination unit is specifically used for:
According to the first image neural network model, determine in the video to be processed the corresponding feature of multiple video frames to
Amount, described eigenvector carry the image information for including in corresponding video frame;
The content type of the video to be processed is determined according to the corresponding feature vector of the multiple video frame.
9. device according to claim 8, which is characterized in that described according to the corresponding feature of the multiple video frame
Vector determines the content type of the video to be processed, comprising:
According to timing of the multiple video frame in the video to be processed, by the corresponding feature of the multiple video frame
Vector forms characteristic vector sequence;
According to described eigenvector sequence, the content class of the video to be processed is determined by the second image neural network model
Type.
10. device according to claim 7, which is characterized in that described device further include:
Third determination unit, for determining that the multiple video clip is belonging respectively to the weight of collection of choice specimens segment according to probabilistic model
Value;
Collection of choice specimens video generation unit, the video clip for meeting preset condition according to weighted value in the multiple video clip are raw
At the corresponding collection of choice specimens video of the video to be processed.
11. device according to claim 10, which is characterized in that the probabilistic model is the general of the corresponding content type
Rate model, the probabilistic model are obtained according to history collection of choice specimens segment training in the content type, and the third determines single
Member is specifically used for:
Probabilistic model image, semantic information according to entrained by the multiple video clip is taken with the history collection of choice specimens segment
Similarity degree with semantic information determines that the multiple video clip is belonging respectively to the weighted value of collection of choice specimens segment.
12. according to device described in claim 7-11 any one, which is characterized in that the corresponding image of the content type is special
Sign includes any of the following:
Instructor in broadcasting's platform picture;
Advertisement picture;
The picture not being inconsistent with the content type;
Video title, run-out picture;
Target face picture.
13. a kind of video processing equipment, which is characterized in that the equipment includes processor and memory:
Said program code is transferred to the processor for storing program code by the memory;
The processor is used to be handled according to the instruction execution video described in any one of claims 1-6 in said program code
Method.
14. a kind of computer readable storage medium, which is characterized in that the computer readable storage medium is for storing program generation
Code, said program code require method for processing video frequency described in any one of 1-6 for perform claim.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910122357.XA CN110166828A (en) | 2019-02-19 | 2019-02-19 | A kind of method for processing video frequency and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910122357.XA CN110166828A (en) | 2019-02-19 | 2019-02-19 | A kind of method for processing video frequency and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110166828A true CN110166828A (en) | 2019-08-23 |
Family
ID=67645378
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910122357.XA Pending CN110166828A (en) | 2019-02-19 | 2019-02-19 | A kind of method for processing video frequency and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110166828A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110851625A (en) * | 2019-10-16 | 2020-02-28 | 联想(北京)有限公司 | Video creation method and device, electronic equipment and storage medium |
CN111432140A (en) * | 2020-06-15 | 2020-07-17 | 成都索贝数码科技股份有限公司 | Method for splitting television news into strips by using artificial neural network |
CN111711855A (en) * | 2020-05-27 | 2020-09-25 | 北京奇艺世纪科技有限公司 | Video generation method and device |
CN112291618A (en) * | 2020-10-13 | 2021-01-29 | 北京沃东天骏信息技术有限公司 | Video preview content generating method and device, computer device and storage medium |
CN112423151A (en) * | 2020-11-17 | 2021-02-26 | 北京金山云网络技术有限公司 | Video strip splitting method, system, device, equipment and storage medium |
CN112565820A (en) * | 2020-12-24 | 2021-03-26 | 新奥特(北京)视频技术有限公司 | Video news splitting method and device |
WO2021109846A1 (en) * | 2019-12-06 | 2021-06-10 | 华为技术有限公司 | Bit stream data processing method and apparatus |
CN113539304A (en) * | 2020-04-21 | 2021-10-22 | 华为技术有限公司 | Video strip splitting method and device |
CN113766268A (en) * | 2021-11-08 | 2021-12-07 | 阿里巴巴达摩院(杭州)科技有限公司 | Video processing method and device, electronic equipment and readable medium |
CN113810782A (en) * | 2020-06-12 | 2021-12-17 | 阿里巴巴集团控股有限公司 | Video processing method and device, server and electronic device |
WO2023197814A1 (en) * | 2022-04-13 | 2023-10-19 | 华为云计算技术有限公司 | Video processing method and system, and related device |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104778230A (en) * | 2015-03-31 | 2015-07-15 | 北京奇艺世纪科技有限公司 | Video data segmentation model training method, video data segmenting method, video data segmentation model training device and video data segmenting device |
US20170177943A1 (en) * | 2015-12-21 | 2017-06-22 | Canon Kabushiki Kaisha | Imaging system and method for classifying a concept type in video |
CN107784118A (en) * | 2017-11-14 | 2018-03-09 | 北京林业大学 | A kind of Video Key information extracting system semantic for user interest |
CN108133020A (en) * | 2017-12-25 | 2018-06-08 | 上海七牛信息技术有限公司 | Video classification methods, device, storage medium and electronic equipment |
CN108965920A (en) * | 2018-08-08 | 2018-12-07 | 北京未来媒体科技股份有限公司 | A kind of video content demolition method and device |
CN109121021A (en) * | 2018-09-28 | 2019-01-01 | 北京周同科技有限公司 | A kind of generation method of Video Roundup, device, electronic equipment and storage medium |
CN109151615A (en) * | 2018-11-02 | 2019-01-04 | 湖南双菱电子科技有限公司 | Method for processing video frequency, computer equipment and computer storage medium |
CN109165573A (en) * | 2018-08-03 | 2019-01-08 | 百度在线网络技术(北京)有限公司 | Method and apparatus for extracting video feature vector |
-
2019
- 2019-02-19 CN CN201910122357.XA patent/CN110166828A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104778230A (en) * | 2015-03-31 | 2015-07-15 | 北京奇艺世纪科技有限公司 | Video data segmentation model training method, video data segmenting method, video data segmentation model training device and video data segmenting device |
US20170177943A1 (en) * | 2015-12-21 | 2017-06-22 | Canon Kabushiki Kaisha | Imaging system and method for classifying a concept type in video |
CN107784118A (en) * | 2017-11-14 | 2018-03-09 | 北京林业大学 | A kind of Video Key information extracting system semantic for user interest |
CN108133020A (en) * | 2017-12-25 | 2018-06-08 | 上海七牛信息技术有限公司 | Video classification methods, device, storage medium and electronic equipment |
CN109165573A (en) * | 2018-08-03 | 2019-01-08 | 百度在线网络技术(北京)有限公司 | Method and apparatus for extracting video feature vector |
CN108965920A (en) * | 2018-08-08 | 2018-12-07 | 北京未来媒体科技股份有限公司 | A kind of video content demolition method and device |
CN109121021A (en) * | 2018-09-28 | 2019-01-01 | 北京周同科技有限公司 | A kind of generation method of Video Roundup, device, electronic equipment and storage medium |
CN109151615A (en) * | 2018-11-02 | 2019-01-04 | 湖南双菱电子科技有限公司 | Method for processing video frequency, computer equipment and computer storage medium |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110851625A (en) * | 2019-10-16 | 2020-02-28 | 联想(北京)有限公司 | Video creation method and device, electronic equipment and storage medium |
WO2021109846A1 (en) * | 2019-12-06 | 2021-06-10 | 华为技术有限公司 | Bit stream data processing method and apparatus |
CN113539304A (en) * | 2020-04-21 | 2021-10-22 | 华为技术有限公司 | Video strip splitting method and device |
CN113539304B (en) * | 2020-04-21 | 2022-09-16 | 华为云计算技术有限公司 | Video strip splitting method and device |
CN111711855A (en) * | 2020-05-27 | 2020-09-25 | 北京奇艺世纪科技有限公司 | Video generation method and device |
CN113810782A (en) * | 2020-06-12 | 2021-12-17 | 阿里巴巴集团控股有限公司 | Video processing method and device, server and electronic device |
CN111432140A (en) * | 2020-06-15 | 2020-07-17 | 成都索贝数码科技股份有限公司 | Method for splitting television news into strips by using artificial neural network |
CN111432140B (en) * | 2020-06-15 | 2020-09-15 | 成都索贝数码科技股份有限公司 | Method for splitting television news into strips by using artificial neural network |
CN112291618A (en) * | 2020-10-13 | 2021-01-29 | 北京沃东天骏信息技术有限公司 | Video preview content generating method and device, computer device and storage medium |
CN112423151A (en) * | 2020-11-17 | 2021-02-26 | 北京金山云网络技术有限公司 | Video strip splitting method, system, device, equipment and storage medium |
CN112565820A (en) * | 2020-12-24 | 2021-03-26 | 新奥特(北京)视频技术有限公司 | Video news splitting method and device |
CN112565820B (en) * | 2020-12-24 | 2023-03-28 | 新奥特(北京)视频技术有限公司 | Video news splitting method and device |
CN113766268A (en) * | 2021-11-08 | 2021-12-07 | 阿里巴巴达摩院(杭州)科技有限公司 | Video processing method and device, electronic equipment and readable medium |
WO2023197814A1 (en) * | 2022-04-13 | 2023-10-19 | 华为云计算技术有限公司 | Video processing method and system, and related device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110166828A (en) | A kind of method for processing video frequency and device | |
CN104239535B (en) | A kind of method, server, terminal and system for word figure | |
CN103578474B (en) | A kind of sound control method, device and equipment | |
CN108334539B (en) | Object recommendation method, mobile terminal and computer-readable storage medium | |
CN106126174B (en) | A kind of control method and electronic equipment of scene audio | |
CN110598046A (en) | Artificial intelligence-based identification method and related device for title party | |
CN107908765B (en) | Game resource processing method, mobile terminal and server | |
CN107679156A (en) | A kind of video image identification method and terminal, readable storage medium storing program for executing | |
CN108965977B (en) | Method, device, storage medium, terminal and system for displaying live gift | |
CN108616448A (en) | A kind of the path recommendation method and mobile terminal of Information Sharing | |
CN109756767A (en) | Preview data playback method, device and storage medium | |
CN108241752A (en) | Photo display methods, mobile terminal and computer readable storage medium | |
CN110209245A (en) | Face identification method and Related product | |
CN107908770A (en) | A kind of photo searching method and mobile terminal | |
CN114357278B (en) | Topic recommendation method, device and equipment | |
CN108769787A (en) | A kind of automatic caching method of video, terminal and computer readable storage medium | |
CN110069675A (en) | A kind of search method and mobile terminal | |
CN110276010A (en) | A kind of weight model training method and relevant apparatus | |
CN104281610B (en) | The method and apparatus for filtering microblogging | |
CN109584897A (en) | Vedio noise reduction method, mobile terminal and computer readable storage medium | |
CN109729267A (en) | Filter selection method, mobile terminal and computer readable storage medium | |
CN108897846A (en) | Information search method, equipment and computer readable storage medium | |
CN108848273A (en) | A kind of new information processing method, mobile terminal and storage medium | |
CN110955788A (en) | Information display method and electronic equipment | |
CN108536869A (en) | A kind of method, apparatus and computer readable storage medium of search participle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190823 |