CN110213670A - Method for processing video frequency, device, electronic equipment and storage medium - Google Patents
Method for processing video frequency, device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN110213670A CN110213670A CN201910472453.7A CN201910472453A CN110213670A CN 110213670 A CN110213670 A CN 110213670A CN 201910472453 A CN201910472453 A CN 201910472453A CN 110213670 A CN110213670 A CN 110213670A
- Authority
- CN
- China
- Prior art keywords
- video
- scene
- processed
- segmentation
- segmentation point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 117
- 238000012545 processing Methods 0.000 title claims abstract description 51
- 238000003860 storage Methods 0.000 title claims abstract description 18
- 230000011218 segmentation Effects 0.000 claims abstract description 392
- 239000013598 vector Substances 0.000 claims abstract description 183
- 230000008569 process Effects 0.000 claims description 70
- 230000009466 transformation Effects 0.000 claims description 49
- 230000008859 change Effects 0.000 claims description 24
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 4
- 239000012634 fragment Substances 0.000 claims description 2
- 230000005236 sound signal Effects 0.000 description 38
- 238000003062 neural network model Methods 0.000 description 16
- 238000000605 extraction Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 12
- 238000012549 training Methods 0.000 description 12
- 208000013021 vision distortion Diseases 0.000 description 10
- 238000001228 spectrum Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 238000006243 chemical reaction Methods 0.000 description 5
- 238000012217 deletion Methods 0.000 description 5
- 230000037430 deletion Effects 0.000 description 5
- 238000009826 distribution Methods 0.000 description 5
- 238000009432 framing Methods 0.000 description 5
- 235000013399 edible fruits Nutrition 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 230000009471 action Effects 0.000 description 2
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 238000005520 cutting process Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 210000004218 nerve net Anatomy 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003750 conditioning effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/233—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Television Signal Processing For Recording (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of method for processing video frequency, device, electronic equipment and storage mediums.Method for processing video frequency includes: to obtain video to be processed, and video to be processed is divided into multiple units video to be processed;The corresponding scene characteristic vector sum audio feature vector of each unit video to be processed is obtained respectively;Scene pre-segmentation point is determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed, and audio pre-segmentation point is determined according to the corresponding audio feature vector of every two adjacent cells video to be processed;Scene cut is carried out to the video to be processed according to the scene pre-segmentation point, the video clip that duration is more than the maximum time threshold value of setting is searched from the video clip that scene cut obtains, as video clip to be split;Audio segmentation is carried out to the video clip to be split according to the audio pre-segmentation point, obtains the video clip that segmentation is completed.The present invention avoids improving the accuracy of demolition, preferably meets user demand.
Description
Technical field
The present invention relates to Internet technical fields, more particularly to a kind of method for processing video frequency, device, electronic equipment and deposit
Storage media.
Background technique
Video display are with copy, tape, film, memory etc. for carrier, for the purpose of screen, screen show, to realize
The art form that vision and comprehensive hearing watch, is the comprehensive morphological of modern art, contains in film, TV play, animation etc.
Hold.Video display class video is usually all long video, and user may and be not concerned with the full content of entire long video in some cases,
But focus more on some segment in long video.Therefore in order to meet the needs of users, video display class video can be torn open
Item splits into multiple video clips, watches for selection by the user.
In the prior art, it generallys use the demolition method based on scene change and demolition is carried out to video display class video, according to view
Whether the scene image information in frequency, which occurs large change, carries out demolition, the time point that scene image information is varied widely
As demolition cut-point.
But for video display class video, Same Scene duration may be long, and under the Same Scene
Practical not only includes a segment.It can be using the video under Same Scene as a video using aforesaid way in the case of this kind
Segment, therefore cause the video clip of demolition too long, demolition inaccuracy is unable to satisfy user demand.
Summary of the invention
The embodiment of the present invention provides a kind of method for processing video frequency, device, electronic equipment and storage medium, to solve demolition
The problem of video clip is too long, demolition inaccuracy, is unable to satisfy user demand.
In a first aspect, the embodiment of the invention provides a kind of method for processing video frequency, which comprises
Video to be processed is obtained, the video to be processed is divided into multiple units video to be processed;
The corresponding scene characteristic vector sum audio feature vector of each unit video to be processed is obtained respectively;
Scene pre-segmentation point, Yi Jigen are determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed
Audio pre-segmentation point is determined according to the corresponding audio feature vector of every two adjacent cells video to be processed;
Scene cut, the video obtained from scene cut are carried out to the video to be processed according to the scene pre-segmentation point
The video clip that duration is more than the maximum time threshold value of setting is searched in segment, as video clip to be split;
Audio segmentation is carried out to the video clip to be split according to the audio pre-segmentation point, obtains the view that segmentation is completed
Frequency segment.
It is optionally, described that audio segmentation is carried out to the video clip to be split according to the audio pre-segmentation point, comprising:
The intermediate point with the video clip to be split is searched from the audio pre-segmentation point apart from nearest audio pre-segmentation point;It presses
Audio segmentation is carried out to the video clip to be split according to the audio pre-segmentation point found;Judge the video that audio segmentation obtains
In segment, if there are the video clips that duration is more than the maximum time threshold value;It is being more than the maximum time there are duration
When the video clip of threshold value, using the duration be more than the maximum time threshold value video clip as video clip to be split,
The intermediate point for executing the lookup from the audio pre-segmentation point and the video clip to be split is returned apart from nearest sound
The step of frequency pre-segmentation point.
Optionally, described to determine that scene is divided in advance according to the corresponding scene characteristic vector of every two adjacent cells video to be processed
After cutpoint, further includes: search the scene pre-segmentation point of the of short duration transformation of scene from the scene pre-segmentation point;Delete the field
The scene pre-segmentation point of the of short duration transformation of scape, obtains remaining scene pre-segmentation point;It is described according to the scene pre-segmentation point to institute
It states video to be processed and carries out scene cut, comprising: the video to be processed is carried out according to the remaining scene pre-segmentation point
Scene cut.
Optionally, the scene pre-segmentation point that the of short duration transformation of scene is searched from the scene pre-segmentation point, comprising: obtain
The duration in the scene pre-segmentation point between every two adjacent scene pre-segmentation point is taken, and searches the field that duration is less than setting
Two adjacent scene pre-segmentation points of scape transformation threshold value;It obtains in the two adjacent scene pre-segmentation points found, it is previous
At least one after at least one unit video to be processed and the latter scene pre-segmentation point before a scene pre-segmentation point
Unit video to be processed;It calculates in the unit video to be processed obtained, the corresponding scene of every two adjacent cells video to be processed
The similarity of feature vector;According to each similarity, average value is calculated;It is greater than preset scene phase in the average value of the similarity
When like degree threshold value, at least one of two adjacent scene pre-segmentation points found are determined as to the field of the of short duration transformation of scene
Scape pre-segmentation point.
Optionally, described to determine that scene is divided in advance according to the corresponding scene characteristic vector of every two adjacent cells video to be processed
After cutpoint, further includes: search the scene pre-segmentation point that non-scene itself converts from the scene pre-segmentation point;Described in deletion
The scene pre-segmentation point that non-scene itself converts, obtains remaining scene pre-segmentation point;It is described according to the scene pre-segmentation point
Scene cut is carried out to the video to be processed, comprising: according to the remaining scene pre-segmentation point to the video to be processed
Carry out scene cut.
Optionally, the scene pre-segmentation point that non-scene itself is searched from the scene pre-segmentation point and is converted, comprising:
For each scene pre-segmentation point, at least one unit video to be processed before current scene pre-segmentation point and later is obtained
At least one unit video to be processed;It determines in the unit video to be processed obtained, each unit video to be processed is corresponding straight
Square figure feature vector;It calculates in the unit video to be processed obtained, the corresponding histogram of every two adjacent cells video to be processed
The similarity of feature vector;According to each similarity, average value is calculated;It is greater than preset histogram in the average value of the similarity
When similarity threshold, the current scene pre-segmentation point is determined as the scene pre-segmentation point that non-scene itself converts.
Optionally, described to determine that scene is divided in advance according to the corresponding scene characteristic vector of every two adjacent cells video to be processed
After cutpoint, further includes: obtain the duration in the scene pre-segmentation point between every two adjacent scene pre-segmentation point, and look into
Duration is looked for be less than two adjacent scene pre-segmentation points of the minimum duration threshold value of setting;The two adjacent scenes that will be found
At least one of pre-segmentation point is deleted, and remaining scene pre-segmentation point is obtained;It is described according to the scene pre-segmentation point to institute
It states video to be processed and carries out scene cut, comprising: the video to be processed is carried out according to the remaining scene pre-segmentation point
Scene cut.
Optionally, described that the video to be processed is divided into before multiple units video to be processed, further includes: detection institute
The theme song segment in video to be processed is stated, and deletes the theme song segment from the video to be processed;It is described will be described
Video to be processed is divided into multiple units video to be processed, comprising: will delete the video to be processed after the theme song segment
It is divided into multiple units video to be processed.
Optionally, it is described obtain respectively the corresponding scene characteristic vector sum audio frequency characteristics of each unit video to be processed to
Amount, comprising: while calling the first process and the second process;Each unit video to be processed is obtained respectively using first process
Corresponding scene characteristic vector;Using second process obtain respectively the corresponding audio frequency characteristics of each unit video to be processed to
Amount.
Second aspect, the embodiment of the invention provides a kind of video process apparatus, described device includes:
The video to be processed is divided into multiple units video to be processed for obtaining video to be processed by division module;
Obtain module, for obtain respectively the corresponding scene characteristic vector sum audio frequency characteristics of each unit video to be processed to
Amount;
Determining module, for determining that scene is pre- according to the corresponding scene characteristic vector of every two adjacent cells video to be processed
Cut-point, and audio pre-segmentation point is determined according to the corresponding audio feature vector of every two adjacent cells video to be processed;
Scene cut module, for carrying out scene cut to the video to be processed according to the scene pre-segmentation point;
First searching module, for searching the maximum time that duration is more than setting from the video clip that scene cut obtains
The video clip of threshold value, as video clip to be split;
Audio segmentation module, for carrying out audio minute to the video clip to be split according to the audio pre-segmentation point
It cuts, obtains the video clip that segmentation is completed.
Optionally, the audio segmentation module includes: audio segmentation point searching unit, is used for from the audio pre-segmentation point
The intermediate point of middle lookup and the video clip to be split is apart from nearest audio pre-segmentation point;Fragment segmentation unit, for pressing
Audio segmentation is carried out to the video clip to be split according to the audio pre-segmentation point found;Segment determination unit, for judging
In the video clip that audio segmentation obtains, if there are the video clips that duration is more than the maximum time threshold value;In the presence of
When length is more than the video clip of the maximum time threshold value, the video clip that the duration is more than the maximum time threshold value is made
For video clip to be split, and call the cut-point searching unit.
Optionally, described device further include: the second searching module, it is short for searching scene from the scene pre-segmentation point
The scene pre-segmentation point temporarily converted;First removing module is obtained for deleting the scene pre-segmentation point of the of short duration transformation of the scene
Remaining scene pre-segmentation point;The scene cut module is specifically used for according to the remaining scene pre-segmentation point to described
Video to be processed carries out scene cut.
Optionally, second searching module includes: scene cut point searching unit, for obtaining the scene pre-segmentation
Duration in point between every two adjacent scene pre-segmentation point, and search two that duration is less than the scene change threshold value of setting
Adjacent scene pre-segmentation point;First video acquisition unit, it is preceding for obtaining in find two adjacent scene pre-segmentation points
At least one after at least one unit video to be processed and the latter scene pre-segmentation point before one scene pre-segmentation point
A unit video to be processed;First similarity calculated, for calculating in the unit video to be processed obtained, every two is adjacent
The similarity of the corresponding scene characteristic vector of unit video to be processed;First average calculation unit is used for according to each similarity,
Calculate average value;First cut-point determination unit is greater than preset scene similarity threshold for the average value in the similarity
When value, at least one of two adjacent scene pre-segmentation points found are determined as the scene of the of short duration transformation of scene in advance minute
Cutpoint.
Optionally, described device further include: third searching module, for searching non-scene from the scene pre-segmentation point
The scene pre-segmentation point of transformation itself;Second removing module, the scene pre-segmentation point converted for deleting the non-scene itself,
Obtain remaining scene pre-segmentation point;The scene cut module is specifically used for according to the remaining scene pre-segmentation point pair
The video to be processed carries out scene cut.
Optionally, the third searching module includes: the second video acquisition unit, for being directed to the scene pre-segmentation point
In each scene pre-segmentation point, obtain at least one unit video to be processed before current scene pre-segmentation point and later
At least one unit video to be processed;Histogram determination unit, for determining in the unit video to be processed obtained, each unit
The corresponding histogram feature vector of video to be processed;Second similarity calculated, for calculating the unit view to be processed obtained
In frequency, the similarity of the corresponding histogram feature vector of every two adjacent cells video to be processed;Second average calculation unit,
For calculating average value according to each similarity;Second cut-point determination unit is greater than pre- for the average value in the similarity
If histogram similarity threshold when, the current scene pre-segmentation point is determined as the scene pre-segmentation that non-scene itself converts
Point.
Optionally, described device further include: the 4th searching module, for obtaining every two phase in the scene pre-segmentation point
Duration between adjacent scene pre-segmentation point, and the two adjacent scenes for searching the minimum duration threshold value that duration is less than setting are pre-
Cut-point;Third removing module is obtained for deleting at least one of two adjacent scene pre-segmentation points found
Remaining scene pre-segmentation point;The scene cut module is specifically used for according to the remaining scene pre-segmentation point to described
Video to be processed carries out scene cut.
Optionally, described device further include: detection module, for detecting the theme song segment in the video to be processed;
4th removing module, for deleting the theme song segment from the video to be processed;The division module, being specifically used for will
It deletes the video to be processed after the theme song segment and is divided into multiple units video to be processed.
Optionally, the acquisition module includes: call unit, is used for while calling the first process and the second process;Scene
Feature acquiring unit, for using first process obtain respectively the corresponding scene characteristic of each unit video to be processed to
Amount;Audio frequency characteristics acquiring unit, for obtaining the corresponding audio of each unit video to be processed respectively using second process
Feature vector.
The third aspect, the embodiment of the invention provides a kind of electronic equipment, comprising: processor;It can for storage processor
The memory executed instruction;Wherein, the processor is configured to executing as above described in any item method for processing video frequency.
Fourth aspect, the embodiment of the invention provides a kind of non-transitorycomputer readable storage mediums, which is characterized in that
When the instruction in the storage medium is executed by the processor of electronic equipment, so that electronic equipment is able to carry out any one as above
The method for processing video frequency.
In embodiments of the present invention, video to be processed is obtained, it is to be processed that the video to be processed is divided into multiple units
Video;The corresponding scene characteristic vector sum audio feature vector of each unit video to be processed is obtained respectively;According to every two phase
The corresponding scene characteristic vector of adjacent unit video to be processed determines scene pre-segmentation point, and waits locating according to every two adjacent cells
The corresponding audio feature vector of reason video determines audio pre-segmentation point;According to the scene pre-segmentation point to the video to be processed
Scene cut is carried out, the piece of video that duration is more than the maximum time threshold value of setting is searched from the video clip that scene cut obtains
Section, as video clip to be split;Audio segmentation is carried out to video clip to be split according to the audio pre-segmentation point, is divided
Cut the video clip of completion.It follows that in the angle based on scene change in the embodiment of the present invention, according to scene pre-segmentation point
After carrying out scene cut to video to be processed, the angle of audio transformation is based further on to the longer video clip of duration, according to
Audio pre-segmentation point carries out audio segmentation, so that avoiding being based only upon scene change carries out that the video clip after demolition is too long of to ask
Topic, improves the accuracy of demolition, preferably meets user demand.
Detailed description of the invention
Fig. 1 is a kind of step flow chart of method for processing video frequency of the embodiment of the present invention;
Fig. 2 is the step flow chart of another method for processing video frequency of the embodiment of the present invention;
Fig. 3 is a kind of schematic diagram of video processing procedure of the embodiment of the present invention;
Fig. 4 is a kind of structural block diagram of video process apparatus of the embodiment of the present invention;
Fig. 5 is the structural block diagram of another video process apparatus of the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
Referring to Fig.1, a kind of step flow chart of method for processing video frequency of the embodiment of the present invention is shown.
The method for processing video frequency of the embodiment of the present invention the following steps are included:
Step 101, video to be processed is obtained, the video to be processed is divided into multiple units video to be processed.
Video to be processed refers to the video display class video with demolition demand.For example, each collection in a TV play can be with
As a video to be processed, a film can be used as a video to be processed, and each collection in an animation can be used as
One video to be processed, etc..
Demolition is carried out to video to be processed, to find cut-point from video to be processed.Video to be processed for one, will
It is divided into multiple units video to be processed and is analyzed.
In a kind of optional embodiment, video to be processed can be divided into multiple units as unit of setting duration and waited for
Handle video.For setting the specific value of duration, those skilled in the art select any suitable value equal based on practical experience
It can.For example for convenience of handling, setting duration can be set to 1s etc..
Step 102, the corresponding scene characteristic vector sum audio feature vector of each unit video to be processed is obtained respectively.
The embodiment of the present invention not only considers the angle of scene change, it is also considered that sound when carrying out demolition to video to be processed
The angle of frequency transformation.Angle based on scene change can determine whether scene converts according to scene characteristic vector.It is based on
The angle of audio transformation, can determine whether audio converts according to audio feature vector.
Therefore, the corresponding scene characteristic vector sum audio of each unit video to be processed is obtained in the embodiment of the present invention respectively
Feature vector.For example, can use has identification image information and obtains image information character pair vector field homoemorphism type, to unit
Image information in video to be processed is identified, the corresponding scene characteristic vector of unit video to be processed is obtained;It can use
With identification audio signal and audio signal character pair vector field homoemorphism type is obtained, to the audio signal in unit video to be processed
It is identified, obtains the corresponding audio feature vector of unit video to be processed.It, will be in following reality for specific acquisition process
It applies in example and is discussed in detail.
Step 103, scene pre-segmentation is determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed
Point, and audio pre-segmentation point is determined according to the corresponding audio feature vector of every two adjacent cells video to be processed.
According to the difference between the corresponding scene characteristic vector of two adjacent cells videos to be processed, two phases can be learnt
Whether whether the scene of adjacent unit video to be processed converts, thus may determine that being field between two adjacent cells videos to be processed
Scape pre-segmentation point.According to the difference between the corresponding audio feature vector of two adjacent cells videos to be processed, two can be learnt
Whether the audio of a adjacent cells video to be processed converts, thus may determine that between two adjacent cells videos to be processed whether
For audio pre-segmentation point.
Therefore, it can determine that multiple scenes are pre- according to the corresponding scene characteristic vector of every two adjacent cells video to be processed
Cut-point can determine multiple audio pre-segmentations according to the corresponding audio feature vector of every two adjacent cells video to be processed
Point.Step 104, scene cut is carried out to the video to be processed according to the scene pre-segmentation point, is obtained from scene cut
The video clip that duration is more than the maximum time threshold value of setting is searched in video clip, as video clip to be split.
Scene cut is carried out to video to be processed according to scene pre-segmentation point, obtains multiple video clips.Each piece of video
The corresponding scene of section, therefore video clip may be different in size.If some scene duration in video to be processed
Longer, the duration of the video clip obtained according to the scene cut is longer, and the video clip can actually continue to be divided into
The shorter video clip of duration.
Therefore, in the video clip that scene cut obtains, the maximum time threshold value T that duration is more than setting is searchedmaxVideo
Segment, using the video clip found as video clip to be split.Video clip to be split is to further progress audio point
The video clip cut.For the specific value of maximum time threshold value, those skilled in the art select any suitable based on practical experience
Value, for example can be set to 6min, 7min, 8min, 9min, etc..
Step 105, audio segmentation is carried out to the video clip to be split according to the audio pre-segmentation point, is divided
The video clip of completion.
For video clip to be split, the angle that can be converted based on audio, it is carried out according to audio pre-segmentation point into
One step carries out audio segmentation.For specific cutting procedure, will be discussed in detail in the following embodiments.Audio segmentation completes it
The video clip that segmentation is completed can be obtained afterwards.
In the angle based on scene change in the embodiment of the present invention, field is carried out to video to be processed according to scene pre-segmentation point
After scape segmentation, it is based further on the angle of audio transformation to the longer video clip of duration, carries out sound according to audio pre-segmentation point
Frequency division is cut, so that avoiding being based only upon scene change carries out the too long problem of the video clip after demolition, improves the accuracy of demolition,
Preferably meet user demand.
Referring to Fig. 2, the step flow chart of another method for processing video frequency of the embodiment of the present invention is shown.
The method for processing video frequency of the embodiment of the present invention the following steps are included:
Step 201, video to be processed is obtained, detects the theme song segment in the video to be processed, and from described to from
The theme song segment is deleted in reason video.
In video display class video, theme song segment is generally included, theme song may include Presence of the Moment and piece caudal flexure.It is treating
It handles before video carries out demolition, can delete the theme song segment in video to be processed, it is subsequent in video to be processed
Hold part and carries out demolition.
In the embodiment of the present invention, the video processing model for detecting theme song segment in video can be first generated.It generates
The process of video processing model may comprise steps of A1~A4.
Step A1 obtains training sample.
In training pattern, can be obtained from internet largely from the Sample video of video display class video first.Sample
This video may include theme song video and not a theme vision distortion frequency, and theme song video may include the head vision distortion in video display video
Frequency and run-out vision distortion frequency, not a theme vision distortion frequency may include video of speaking, cheer video, applause video etc. in video display video.
Sample video is labeled by mark personnel, obtains the markup information of Sample video, markup information is used to indicate sample view
Whether frequency belongs to theme song classification.For example, markup information is that " 1 " indicates that Sample video is the theme bent classification, markup information is " 0 "
Instruction Sample video is not a theme song classification.The markup information of the Sample video and Sample video that will acquire is instructed as one
Practice sample, using a large amount of training sample as training sample set.The treatment process of each training sample is identical, the present invention
The treatment process for being directed to a training sample is mainly introduced in embodiment.
In the embodiment of the present invention, it can come by acquisition from the Sample video of the video display video of multiple and different types
Guarantee the diversity of sample;It can be by the theme song video and not a theme vision distortion frequency of acquisition equal number, to guarantee sample
Uniformity.For example, 2000 Sample videos from TV play class video display video are obtained, wherein 1000 vision distortions that are the theme
Frequently, 1000 are not a theme vision distortion frequency;2000 Sample videos from film class video display video are obtained, wherein 1000 are
Theme song video, 1000 are not a theme vision distortion frequency;2000 Sample videos from animation class video display video are obtained, wherein
1000 vision distortion frequencies that are the theme, 1000 are not a theme vision distortion frequency.By above-mentioned 6000 Sample videos and the mark of Sample video
Information is as training sample set.
Wherein, for the specific duration of each Sample video, those skilled in the art select any suitable based on practical experience
Value, such as duration can be 3s, 4s, 5s, etc..
Sample video is divided into multiple unit sample videos by step A2.
The video for detecting the theme song segment in video is trained to handle model in the embodiment of the present invention, it is contemplated that video
In theme song segment be theme song classification there are the audio in consistency namely theme song segment in audio, pass through sound
Frequency feature vector may determine whether the bent classification that is the theme, therefore the video processing model in the embodiment of the present invention is based primarily upon sound
Frequency feature vector detects whether the bent classification that is the theme.
For a Sample video, it is divided into multiple unit sample videos and is analyzed.
In a kind of optional embodiment, Sample video can be divided into multiple unit samples as unit of setting duration
Video.For setting the specific value of duration, those skilled in the art select any suitable value based on practical experience.Than
It such as, is the audio of 1s since neural network model is manageable if obtaining audio feature vector using neural network model
Signal, therefore set duration and can be set to 1s, etc..
Step A3 obtains the corresponding audio feature vector of the unit sample video for each unit sample video.
For each unit sample video, the corresponding audio feature vector of unit sample video is obtained respectively.
For example, a length of 5s, is divided into unit sample video for Sample video A as unit of 1s at that time for Sample video A
1, unit sample video 2, unit sample video 3, unit sample video 4, unit sample video 5, totally 5 unit sample videos.
Therefore, respectively obtain the corresponding audio feature vector of unit sample video 1, the corresponding audio feature vector of unit sample video 2,
The corresponding audio feature vector of unit sample video 3, the corresponding audio feature vector of unit sample video 4, unit sample video 5
Corresponding audio feature vector.
In a kind of optional embodiment, obtain the corresponding audio feature vector of a unit sample video may include with
Lower step A31~A32.
Step A31 generates the corresponding spectrogram of audio signal in the unit sample video.
Step A31 can further include following steps A311~A313:
Step A311 carries out sub-frame processing to the audio signal in the unit sample video, obtains multiple audio signals
Frame.
Audio signal is extracted from unit sample video, and the audio signal in unit sample video is carried out at framing
Reason.
In a kind of optional embodiment, multimedia video handling implement FFmpeg can use from unit sample video
Extract audio signal.FFmpeg be it is a set of can be used to record, converted digital audio, video, and opening for stream can be translated into
Source computer program.It provides recording, conversion and the total solution for fluidizing audio-video.It contains very advanced
Audio/video encoding and decoding library libavcodec is many in libavcodec in order to guarantee high portable and encoding and decoding quality
From the beginning code is developed.FFmpeg has very powerful function, including the conversion of video acquisition function, video format, video
Grabgraf, to video with watermark etc..For example, can use FFmpeg according to sample rate and PCM_S16LE (the Pulse Code of 16k
Modulation, pulse code modulation) coded format audio signal, the audio signal of extraction are extracted from unit sample video
The formats such as wav can be saved as.
Audio signal is being macroscopically jiggly, is that smoothly, audio signal has short-term stationarity (10 on microcosmic
It is considered that audio signal approximation is constant in~30ms), thus audio signal can be divided into some short sections to be handled,
Here it is framings, each short section is known as an audio signal frame after framing.For example, can be using the framing side of overlapping segmentation
Method, namely interception way back-to-back is not used, but use the interception way of overlapped a part.Wherein, former frame and
The overlapping part of a later frame is known as frame shifting, and frame, which is moved, is generally 0~0.5 with the ratio of frame length.It can basis for specific frame length
Actual conditions setting, it is 33~100 that frame number per second, which can be set,.
Step A312 carries out windowing process to each audio signal frame and Fourier transformation is handled, obtains the unit sample
The corresponding initial spectrum figure of audio signal in this video.
Audio is not stop to change in long range, and the characteristic that do not fix can not process, so each audio is believed
Number frame carries out windowing process, and audio signal frame is multiplied by adding window with a window function.The purpose of adding window is to eliminate each audio
The signal discontinuity that signal frame both ends are likely to result in makes global more continuous.The cost of adding window is an audio signal frame
Both ends part be weakened, so to have when framing, between frame and frame overlapping.In practical applications, audio is believed
Number frame, which carries out common window function when windowing process, to be square window, Hamming window, Hanning window, etc..According to the frequency domain of window function
Characteristic can preferably use Hamming window.
Since the transformation of audio signal in the time domain is generally difficult to find out the characteristic of signal, so usually converting it to frequency
Energy distribution on domain is observed, and different Energy distributions can represent the characteristic of different phonetic.So after windowing process,
Fourier transformation processing is carried out to each audio signal frame after windowing process, to obtain the Energy distribution on frequency spectrum, is obtained each
The frequency spectrum of audio signal frame, and then obtain the corresponding initial spectrum figure of the audio signal in unit sample video.
Step A313 carries out Meier conversion process to the initial spectrum figure and obtains Meier spectrogram, by the Meier frequency
Spectrogram is as the corresponding spectrogram of audio signal in the unit sample video.
Initial spectrum figure is often a biggish figure, in order to obtain the audio frequency characteristics of suitable size, can be initial frequency
Spectrogram carries out Meier conversion process by Meier (Mel) filter group, is transformed to Meier spectrogram.
The unit of frequency is hertz (Hz), and the frequency range that human ear can be heard is 20-20000Hz, but human ear is this to Hz
Scale unit is not linear perception relationship.For example, if we have adapted to the tone of 1000Hz, if pitch frequency is improved
To 2000Hz, our ear can only be aware of frequency and improve a little, be detectable frequency at all and be doubled.It will be general
Logical frequency translation is mel-frequency, and mapping relations are shown below:
Mel (f)=2595*log10(1+f/700)
Wherein, f is common frequency, and mel (f) is mel-frequency.
By above-mentioned formula, human ear is to the perceptibility of frequency just at linear relationship.That is, under mel-frequency,
If the mel-frequency of two section audios differs twice, the tone that human ear can perceive probably also is differed twice.
According to the actual situation, frequency is divided into multiple Meier filters by human ear sensitivity, obtains Meier filter group,
Meier filter group may include 20~40 Meier filters.In Mel frequency range, the center frequency of each Meier filter
Rate is the linear distribution of equal intervals, but is not equal intervals in frequency range.Using Meier filter group to initial spectrum
Figure is filtered, and obtains Meier spectrogram, and the audio signal which is determined as in unit sample video is corresponding
Spectrogram.
The corresponding spectrogram of audio signal in the unit sample video is inputted preset neural network by step A32
The audio feature vector that the neural network model exports is determined as the corresponding audio frequency characteristics of the unit sample video by model
Vector.
In the embodiment of the present invention, neural network model can use, the audio signal in unit sample video is corresponding
Spectrogram inputs neural network model, after carrying out feature extraction inside neural network model, neural network model output
Audio feature vector, the audio feature vector are the corresponding audio feature vector of unit sample video.
In a kind of optional embodiment, VGGish (Visual Geometry Group, visual geometric group) can use
Model extraction audio feature vector.VGGish model may include convolutional layer, full articulamentum etc., and wherein convolutional layer can be used for mentioning
Feature is taken, full articulamentum can be used for classifying the feature of extraction obtaining corresponding feature vector.Therefore, by unit sample
The corresponding spectrogram of audio signal in video inputs VGGish model, extracts the audio frequency characteristics in spectrogram by convolutional layer,
The audio frequency characteristics of extraction are inputted full articulamentum again by convolutional layer, are classified by full articulamentum to audio frequency characteristics, are obtained 128 dimensions
Audio feature vector, full articulamentum exports the audio feature vector.
In the embodiment of the present invention, the corresponding audio feature vector of each unit sample video can be saved as into TFRecord
Format.The data of TFRecord format use binary format in storage, and occupancy disk space is smaller, speed when reading data
Faster.
Step A4, using the corresponding audio feature vector of continuous at least two unit samples video as input, by the sample
Target of the markup information of this video as output, is trained preset initial model, and the model that training is completed determines
Model is handled for video.
If representing a Sample video using the corresponding feature vector of a unit sample video to be trained, due to one
The duration of a unit sample video is shorter, and corresponding feature vector may not be able to accurately and comprehensively represent entire Sample video,
Therefore, a sample is represented using the corresponding audio feature vector of continuous at least two unit samples video in the embodiment of the present invention
Video is trained.
For a Sample video, the continuous at least two unit samples video that will be divided by the Sample video
Corresponding audio feature vector is as input, using the markup information of the Sample video as the target of output, to preset initial
Model is trained.
The process being trained to preset initial model may include step A41~A43:
Step A41 randomly selects continuous at least two unit samples video, by the corresponding sound of unit sample video of extraction
The initial model is inputted after the splicing of frequency feature vector, obtains the prediction probability that the Sample video belongs to theme song classification.
Initial model refers to the model with classification feature not being trained also.Initial model can be to the audio of input
Feature vector is analyzed, and whether output Sample video belongs to the prediction probability of theme song classification, but initial model output
Prediction probability is usually inaccurate, therefore to be trained to initial model, to obtain accurate video processing model.
From the unit sample video divided by Sample video, continuous at least two unit samples view is randomly selected
Frequently, the corresponding audio feature vector of the unit sample video of extraction is inputted into initial model, initial model exports Sample video category
In the prediction probability of theme song classification.
For example, Sample video A is divided into unit sample video 1, unit sample as unit of 1s for Sample video A
Video 2, unit sample video 3, unit sample video 4, unit sample video 5, totally 5 unit sample videos.From 5 unit samples
Continuous 3 unit sample videos are randomly selected in this video, each unit sample video corresponds to the audio feature vector of 128 dimensions,
The corresponding feature vector of 3 unit sample videos is spliced into the audio feature vector of 128*3=384 dimension, inputs initial model
In.Initial model output Sample video A belongs to the prediction probability of theme song classification.
Step A42 belongs to the prediction probability of theme song classification and the mark of the Sample video according to the Sample video
Information is infused, the corresponding penalty values of the Sample video are calculated.
The prediction probability that Sample video belongs to theme song classification is the reality output of initial model, the mark letter of Sample video
Breath is the target of output, according to reality output penalty values corresponding with the Sample video that the target of output calculates extraction.Penalty values
It can indicate that Sample video belongs to the extent of deviation of the markup information of the prediction probability of theme song classification and the Sample video of extraction.
In a kind of optional embodiment, the markup information of Sample video and Sample video can be belonged into theme song classification
Prediction probability between difference as penalty values.For example, the prediction probability that Sample video belongs to theme song classification is 0.8, sample
The markup information of this video is 1, then penalty values can be 0.2.
Step A43 determines that training is completed when the penalty values are less than setting loss threshold value.
Penalty values are smaller, and the robustness of model is better.It is preset in the embodiment of the present invention for measuring whether model instructs
Practice the loss threshold value completed.If penalty values are less than setting loss threshold value, it may be said that bright Sample video belongs to theme song classification
The extent of deviation of the markup information of prediction probability and Sample video is smaller, at this time it is considered that training is completed;If penalty values are big
In or equal to setting loss threshold value, it may be said that bright Sample video belongs to the prediction probability of theme song classification and the mark of Sample video
The extent of deviation for infusing information is larger, and the parameter of adjustable model, continues with next training sample and be trained at this time.
For the specific value of setting loss threshold value, those skilled in the art select any suitable value based on practical experience
?.For example it can be set to 0.1,0.2,0.3, etc..
The model that training is completed can be used as video processing model, for carrying out the inspection of theme song segment to video to be processed
It surveys.
In the embodiment of the present invention, the process for detecting the theme song segment in the video to be processed be may comprise steps of
B1~B5.
Step B1 extracts head segment and run-out segment from the video to be processed.
Theme song includes Presence of the Moment and piece caudal flexure, and Presence of the Moment is located at the beginning part of video to be processed, piece caudal flexure be located to
Handle the ending of video.Therefore, in order to save the processing time, head segment and run-out can be extracted from video to be processed
Segment only detects the head segment where Presence of the Moment and the run-out segment where piece caudal flexure.
In a kind of optional embodiment, piece can be extracted from the beginning part in video to be processed according to setting percentage
The head segment of Duan Zuowei video to be processed extracts segment conduct from the ending in video to be processed according to setting percentage
The run-out segment of video to be processed.For setting the specific value of percentage, those skilled in the art are arranged according to the actual situation
Any suitable value, for example can be set and set percentage as 10%, 15%, 20%, etc..
The head segment and the run-out segment are divided into multiple units video to be processed respectively by step B2.
It is similar with above-mentioned steps A2, based on consistency of the theme song segment in audio in video to be processed, Ke Yitong
It crosses audio feature vector and determines whether the bent classification that is the theme.
Head segment and run-out segment in video to be processed for one, are divided into multiple units video to be processed
It is analyzed.It waits locating for example, multiple units can be divided into for head segment and run-out segment as unit of setting duration respectively
Manage video.The setting duration being related in step B2 can be identical as the setting duration being related in above-mentioned steps A2.
Step B3, for each unit video to be processed, obtain the corresponding audio frequency characteristics of unit video to be processed to
Amount.
Obtaining the corresponding audio feature vector of unit video to be processed may include: to generate the unit view to be processed
The corresponding spectrogram of audio signal in frequency;The corresponding spectrogram input of audio signal in unit video to be processed is pre-
If neural network model, the audio feature vector that the neural network model exports is determined as unit video to be processed
Corresponding audio feature vector.
The corresponding spectrogram of audio signal generated in unit video to be processed may include: to wait locating to the unit
The audio signal managed in video carries out sub-frame processing, obtains multiple audio signal frames;Each audio signal frame is carried out at adding window
Reason and Fourier transformation processing, obtain the corresponding initial spectrum figure of audio signal in unit video to be processed;To described
Initial spectrum figure carries out Meier conversion process and obtains Meier spectrogram, using the Meier spectrogram as unit view to be processed
The corresponding spectrogram of audio signal in frequency.
Step B3 is similar with above-mentioned steps A3, and referring in particular to the associated description of step A3, the embodiment of the present invention is to this
No longer it is discussed in detail.
Step B4, including comprising the video to be processed, the corresponding sound of continuous at least two units video to be processed
The pre-generated video of frequency feature vector input handles model, determines that the unit waits for according to the output that the video handles model
Whether processing video belongs to theme song classification.
If directlying adopt whether the corresponding feature vector of a unit video to be processed detects unit video to be processed
Belong to theme song classification, since the duration of a unit video to be processed is shorter, corresponding feature vector may not be able to be accurate
Ground determines whether unit video to be processed really belongs to theme song classification.Therefore using comprising current single in the embodiment of the present invention
Including the video to be processed of position, the corresponding audio feature vector of continuous at least two units video to be processed determines current one
Whether video to be processed belongs to theme song classification.
For a unit video to be processed, including comprising unit video to be processed, continuous at least two
The corresponding audio feature vector of unit video to be processed inputs the video processing model of above-mentioned generation.Video handles model to audio
After feature vector is analyzed, the prediction probability that unit video to be processed belongs to theme song classification is exported.It gets at video
After the output for managing model, the unit video to be processed for comparing video processing model output belongs to the prediction probability of theme song classification
Whether it is more than or equal to setting probability threshold value and determines that unit video to be processed belongs to master when if it is being more than or equal to
Inscribe bent classification.
For setting the specific value of probability threshold value, those skilled in the art select any suitable value based on practical experience
?.For example it can be set to 0.7,0.8,0.9, etc..
For example, waiting locating comprising continuous 3 units including unit video 3 to be processed for video 3 to be processed for unit
Manage video, can as unit of video 1 to be processed, unit video 2 to be processed, unit video 3 to be processed, or unit waits locating
Manage video 2, unit video 3 to be processed, unit video 4 to be processed, or unit video 3 to be processed, unit view to be processed
Frequently 4, unit video 5 to be processed.Wherein, unit video 2 to be processed, unit video 3 to be processed, unit video 4 to be processed this
Scheme had both considered the audio feature vector before unit video 3 to be processed, it is also considered that arrived unit video 3 to be processed it
Audio feature vector afterwards, therefore utilize unit video 2 to be processed, unit video 3 to be processed, unit video 4 to be processed this 3
The corresponding audio feature vector of continuous unit video to be processed, the corresponding result of the unit determined video 3 to be processed are compared
It is more accurate in other two schemes.
The video 2 to be processed, single as unit of comprising continuous 3 units video to be processed including unit video 3 to be processed
For position video 3 to be processed, unit video 4 to be processed, unit video 2 corresponding 128 to be processed is tieed up into audio feature vector, list
Position video 3 corresponding 128 to be processed ties up audio feature vector and unit video 4 corresponding 128 to be processed tie up audio frequency characteristics to
Amount, is spliced into the audio feature vector input video processing model of 128*3=384 dimension, and video processing model output unit waits locating
Reason video 3 belongs to the prediction probability of theme song classification, if the prediction probability is greater than setting probability threshold value, it is determined that unit waits locating
Reason video 3 belongs to theme song classification.
Step B5, by the unit video to be processed for belonging to theme song classification, continuous unit video to be processed is spelled
It connects, obtains the Presence of the Moment segment and run-out knee-piece section in the video to be processed.
After determining whether each unit video to be processed belongs to theme song classification, if some unit view to be processed
Frequency belongs to theme song classification, can determine that unit video to be processed belongs to the part in theme song segment, if some unit
Video to be processed is not belonging to theme song classification, can determine the part that unit video to be processed belongs in not a theme knee-piece section.
So if continuous multiple unit videos to be processed belong to theme song classification, then it will continuously belong to the unit of theme song classification
Video to be processed is spliced, and the Presence of the Moment segment and run-out knee-piece section in video to be processed are obtained.
Theme song in video to be processed includes Presence of the Moment and piece caudal flexure, therefore slice can be determined from video to be processed
Cephalic flexure segment and run-out knee-piece section.The unit video to be processed for belonging to theme song classification that will be divided by the head segment
In, continuous unit video to be processed is spliced, and the Presence of the Moment segment in the video to be processed is obtained;It will be by the run-out
What segment divided belongs in the unit video to be processed of theme song classification, and continuous unit video to be processed is spliced,
Obtain the run-out knee-piece section in the video to be processed.
After the head segment of video to be processed and run-out segment are divided into multiple units video to be processed, it can also mark
Remember the corresponding initial time of each unit video to be processed and end time.Therefore, it is waited for by the unit for belonging to theme song classification
It handles in video, continuous unit video to be processed is spliced, and Presence of the Moment segment and piece in the video to be processed are obtained
It, can be using the initial time of first unit video to be processed in the Presence of the Moment segment as the head after caudal flexure segment
The initial time of knee-piece section, using the end time of the last one unit video to be processed in the Presence of the Moment segment as described
The end time of cephalic flexure segment;Using the initial time of first unit video to be processed in the run-out knee-piece section as described
The initial time of caudal flexure segment, using the end time of the last one unit video to be processed in the run-out knee-piece section as described in
The end time of run-out knee-piece section.According to the initial time of theme song segment (head segment and run-out segment) and end time,
Theme song segment is deleted from video to be processed.
Step 202, the video to be processed after deletion theme song segment is divided into multiple units video to be processed.
It, can be as unit of setting duration by the view to be processed after deletion theme song segment in a kind of optional embodiment
Frequency is divided into multiple units video to be processed.The setting duration being related in step 202 can be with the setting that is related in step 101
Duration is identical.
Fig. 3 is a kind of schematic diagram of video processing procedure of the embodiment of the present invention.As shown in figure 3, the long video in Fig. 3 is
For video to be processed, long video is divided to obtain multiple units video to be processed.
Step 203, while the first process and the second process are called, obtains each unit respectively using first process and waits for
The corresponding scene characteristic vector of video is handled, obtains the corresponding sound of each unit video to be processed respectively using second process
Frequency feature vector.
In the embodiment of the present invention, if handled using the same process multiple units video to be processed, processing effect
Rate is lower.Therefore the first process and the processing of the second task parallelism can be set, while calling the first process and the second process, utilize
First process obtains the corresponding scene characteristic vector of each unit video to be processed respectively, is obtained respectively using the second process each
The corresponding audio feature vector of unit video to be processed, to improve treatment effeciency.First process and the second process can store
In in process pool.
As shown in figure 3, including the first process process2 and the second process process3 in process pool in Fig. 3.
Scene characteristic vector can be according to image information acquisition, therefore a frame figure can be extracted from unit video to be processed
Picture.By the information input neural network model of the frame image, after carrying out feature extraction inside neural network model, nerve
Network model exports scene feature vector, which is the corresponding scene characteristic vector of unit sample video.
In a kind of optional embodiment, multimedia video handling implement FFmpeg can use from unit video to be processed
Middle extraction image.For example, image size can be 255*255, the image of extraction saves as the formats such as jpg.
In a kind of optional embodiment, it is to be processed from unit to can use the acquisition of neural network model Resnet50 model
The corresponding scene characteristic vector of the image information extracted in video.Resnet50 model is residual error network model, in residual error network
In, do not allow for network to be directly fitted original mapping, but regression criterion maps.Resnet50 model may include convolutional layer,
Full articulamentum etc., wherein convolutional layer can be used for extracting feature, and full articulamentum can be used for classify to the feature of extraction
To corresponding feature vector.Therefore, the image extracted in unit sample video is inputted into Resnet50 model, is mentioned by convolutional layer
The scene characteristic in image is taken, the scene characteristic of extraction is inputted full articulamentum again by convolutional layer, by full articulamentum to scene spy
Sign is classified, and obtains the scene characteristic vector of 2048 dimensions, full articulamentum exports the scene characteristic vector.
As shown in figure 3, being used to obtain the nerve net of scene characteristic vector in Fig. 3 in the first process process2 with one
Network model, neural network model are specifically as follows RGB Resnet50, are obtained using neural network model RGB Resnet50 single
The corresponding 2048 dimension scene characteristic vector of position Sample video.
Audio feature vector can be obtained according to audio signal.For obtain the corresponding audio frequency characteristics of unit video to be processed to
The detailed process of amount, is referred to the related description of above-mentioned steps A3 and step B3, and the embodiment of the present invention is no longer discussed in detail herein
It states.
As shown in figure 3, being used to obtain the nerve net of audio feature vector in Fig. 3 in the second process process3 with one
Network model, neural network model are specifically as follows Audio VGGish, are obtained using neural network model Audio VGGish single
The corresponding 128 dimension audio feature vector of position video to be processed.
Step 204, scene pre-segmentation is determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed
Point, and audio pre-segmentation point is determined according to the corresponding audio feature vector of every two adjacent cells video to be processed.
Scene pre-segmentation point can be determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed.It crosses
Journey may include: to obtain the similarity of the corresponding scene characteristic vector of every two adjacent cells video to be processed;Similarity is small
In the scene similarity threshold T of settingsceneTwo adjacent cells videos to be processed intermediate point as scene pre-segmentation point.Such as
The similarity of the corresponding scene characteristic vector of two adjacent cells videos to be processed of fruit is less than the scene similarity threshold of setting, can
To illustrate that the scene of two adjacent cells videos to be processed converts, therefore can be by two adjacent cells videos to be processed
Intermediate point is as scene pre-segmentation point.
Audio pre-segmentation point can be determined according to the corresponding audio feature vector of every two adjacent cells video to be processed.It crosses
Journey may include: to obtain the similarity of the corresponding audio feature vector of every two adjacent cells video to be processed;Similarity is small
In the audio similarity threshold value T of settingaudioTwo adjacent cells videos to be processed intermediate point as audio pre-segmentation point.Such as
The similarity of the corresponding audio feature vector of two adjacent cells videos to be processed of fruit is less than the audio similarity threshold value of setting, can
To illustrate that the audio of two adjacent cells videos to be processed converts, therefore can be by two adjacent cells videos to be processed
Intermediate point is as audio pre-segmentation point.
For the specific value of scene similarity threshold and audio similarity threshold value, those skilled in the art can be according to reality
Border situation is arranged any suitable value, the embodiment of the present invention to this with no restriction.Such as can be set scene similarity threshold and
Audio similarity threshold value is 0.1,0.2,0.3, etc..
In a kind of optional embodiment, the similarity of two feature vectors can be according between the two feature vectors
COS distance is measured.COS distance is to use in vector space two vectorial angle cosine values as measuring two inter-individual differences
Size measurement.COS distance between two feature vectors is bigger, and the similarity of two feature vectors is smaller.Therefore, such as
Fruit measures similarity with COS distance, then can be with set scene distance threshold and audible distance threshold value.When two scene characteristics to
When COS distance between amount is greater than scene distance threshold value, determine that the similarity of two scene characteristic vectors is less than scene similarity
Threshold value;When the COS distance between two audio feature vectors is greater than audible distance threshold value, two audio feature vectors are determined
Similarity be less than audio similarity threshold value.Such as scene distance threshold value can be set and audible distance threshold value be 0.7,0.8,
0.9, etc..
Assuming that two feature vectors are respectively x=(x1, x2..., xN)TWith y=(y1, y2..., yN)T, T expression transposition.Two
COS distance between a feature vector are as follows:
Wherein, N indicates the dimension of feature vector, and d indicates COS distance.
It is, of course, also possible to using other way measure two feature vectors similarity, such as Euclidean distance, geneva away from
From, manhatton distance etc., the embodiment of the present invention to this with no restriction.
Step 205, the scene pre-segmentation point that the of short duration transformation of scene is searched from the scene pre-segmentation point, deletes the field
The scene pre-segmentation point of the of short duration transformation of scape, obtains remaining scene pre-segmentation point.
In view of there may be the situations of transformation in scene of short duration (several seconds to more than ten seconds or so) in video to be processed, such as
Memory scene is transformed to by current scene, by transforming to the situations such as current scene after of short duration memory scene again.According to phase
Similarity between the corresponding scene characteristic vector of adjacent unit video to be processed, the initial position of the of short duration scene and stop bits
Scene pre-segmentation point can be also confirmed as by setting, but may be actually the plot of Same Scene before and after the of short duration scene, by upper
State the scene pre-segmentation point that scene pre-segmentation point caused by the situation of the of short duration transformation of scene is properly termed as the of short duration transformation of scene.For
Above situation, the scene pre-segmentation point of the of short duration transformation of scene is searched in the embodiment of the present invention from scene pre-segmentation point, and is deleted
The scene pre-segmentation point of the of short duration transformation of scene guarantees plot after segmentation to further increase the accuracy of scene pre-segmentation point
Integrality.
In a kind of optional embodiment, the scene pre-segmentation point of the of short duration transformation of scene is searched from scene pre-segmentation point,
It may include step C1~C5.
Step C1 obtains the duration in the scene pre-segmentation point between every two adjacent scene pre-segmentation point, and looks into
Duration is looked for be less than two adjacent scene pre-segmentation points of the scene change threshold value of setting.
Duration between two adjacent scene pre-segmentation points as caused by the of short duration transformation of scene is smaller, therefore can search
Duration is less than the scene change threshold value T of settingshortTwo adjacent scene pre-segmentation points, it is subsequent to determine that the two are adjacent again
Whether scene pre-segmentation point is caused by the of short duration transformation of scene.For the specific value of scene change threshold value, those skilled in the art
Any suitable value can be set according to the actual situation, the embodiment of the present invention to this with no restriction.For example scene change can be set
Changing threshold value is 7s, 8s, 9s etc..
Step C2 is obtained in the two adjacent scene pre-segmentation points found, before previous scene pre-segmentation point
At least one unit video to be processed after at least one unit video to be processed and the latter scene pre-segmentation point.
Step C3 is calculated in the unit video to be processed of acquisition, the corresponding scene of every two adjacent cells video to be processed
The similarity of feature vector.
Step C4 calculates average value according to each similarity.
Step C5, when the average value of the similarity is greater than preset scene similarity threshold, by find two
At least one of adjacent scene pre-segmentation point is determined as the scene pre-segmentation point of the of short duration transformation of scene.
For example, duration is less than the scene change of setting between two adjacent scene pre-segmentation points 2 and scene pre-segmentation point 3
Threshold value, 5 units after 5 units video to be processed and scene pre-segmentation point 3 before obtaining scene pre-segmentation point 2 wait locating
Video is managed, and every two adjacent unit video to be processed in 5 unit videos to be processed before obtaining scene pre-segmentation point 2
Every two phase in the similarity of corresponding scene characteristic vector and 5 unit videos to be processed after scene pre-segmentation point 3
The similarity of the adjacent corresponding scene characteristic vector of unit video to be processed, calculates the average value of whole similarities of acquisition, such as
Fruit average value is greater than scene similarity threshold, illustrates the scene before scene pre-segmentation point 2 and the field after scene pre-segmentation point 3
Scape is similar, thus may determine that scene pre-segmentation point 2 and scene pre-segmentation point 3 are caused by the of short duration transformation of scene.This kind
In the case of, any one can be deleted from scene pre-segmentation point 2 and scene pre-segmentation point 3 or two are deleted, so that field
Of short duration scene between scape pre-segmentation point 2 and scene pre-segmentation point 3 in conjunction with previous video segment, or with the latter video
Segment combines, or in conjunction with former and later two video clips.
Step 206, the scene pre-segmentation point that non-scene itself converts is searched from the scene pre-segmentation point, described in deletion
The scene pre-segmentation point that non-scene itself converts, obtains remaining scene pre-segmentation point.
In view of there may be lead to adjacent cells video pair to be processed due to external factor influence in video to be processed
The similarity for the scene characteristic vector answered is smaller, so that it is determined that being scene pre-segmentation point.Such as due to illumination effect or distance
Scape transformation influences determining scene pre-segmentation point, but actually scene itself does not convert.Therefore such scene pre-segmentation
Point is caused by being influenced as external factor, and not caused by scene transformation itself, it should not be used as segmentation foundation, such scene is pre-
Cut-point is properly termed as the scene pre-segmentation point that non-scene itself converts.For above situation, from scene in the embodiment of the present invention
The scene pre-segmentation point that non-scene itself converts is searched in pre-segmentation point, and deletes non-scene itself the scene pre-segmentation converted
Point guarantees the integrality of plot after segmentation to further increase the accuracy of scene pre-segmentation point.
In a kind of optional embodiment, the scene pre-segmentation that non-scene itself converts is searched from scene pre-segmentation point
Point may include step D1~D5.
Step D1, for each scene pre-segmentation point, at least one unit before obtaining current scene pre-segmentation point is waited for
Handle video and at least one unit video to be processed later.
Step D2 determines in the unit video to be processed obtained, the corresponding histogram feature of each unit video to be processed
Vector.
Histogram feature can show the distribution situation of tone in an image, disclose each gray scale in image
The quantity that lower pixel occurs can tentatively judge the exposure status of image, histogram according to the image aspects that these numerical value are drawn
Figure is the best feedback of image exposure situation.Histogram feature vector can be according to image information acquisition, therefore can wait for from unit
A frame image is extracted in processing video, and the histogram feature vector for obtaining the frame image is corresponding as unit video to be processed
Histogram feature vector.
In a kind of optional embodiment, it can use the acquisition of Lab color space model and mentioned from unit video to be processed
The corresponding histogram feature vector of the image information taken.Lab color space model is the feeling based on people to color, is described
The display mode of color, and non-display equipment generate color required for specific colorant quantity, therefore Lab be also regarded as equipment without
The color model of pass.Lab colour model tri- elements of a, b by brightness (L) and in relation to color form.A is indicated from carmetta
To the range of green, b indicates the range by yellow to blue.All colors can be made of this 3 value interaction variations.It will be single
The image input Lab color space model extracted in the Sample video of position, in the histogram feature of model internal extraction image, and it is defeated
The histogram feature vector of 4096 dimensions out.
It is similar with above-mentioned steps 203, in order to improve treatment effeciency, multiple units video to be processed can be obtained in division
Afterwards, the corresponding histogram feature vector of each unit video to be processed is obtained using an independent third process.Such as Fig. 3 institute
Show, in the process pool in Fig. 3 include third process process1, process1 in have one for obtain histogram feature to
The color space model of amount, color space model are specifically as follows Lab Histogram, are obtained using Lab Histogram single
The corresponding 4096 dimension histogram feature vector of position Sample video.
Step D3 is calculated in the unit video to be processed of acquisition, the corresponding histogram of every two adjacent cells video to be processed
The similarity of figure feature vector.
Step D4 calculates average value according to each similarity.
Step D5 will be described current when the average value of the similarity is greater than preset histogram similarity threshold
Scene pre-segmentation point is determined as the scene pre-segmentation point that non-scene itself converts.
For the specific value of histogram similarity threshold, those skilled in the art can be arranged arbitrarily according to the actual situation
Applicable value, the embodiment of the present invention to this with no restriction.For example it is 0.2,0.3,0.4 that histogram similarity threshold, which can be set,
Deng.
For example, for a scene pre-segmentation point 4, obtain 5 unit videos to be processed before scene pre-segmentation point 4 and
5 unit videos to be processed later obtain this 10 corresponding histogram feature vectors of unit video to be processed, and obtain
The similarity of the corresponding histogram feature vector of every two adjacent cells video to be processed in this 10 unit videos to be processed is taken,
The average value of the whole similarities obtained is calculated, if average value is greater than histogram similarity threshold, illustrates scene pre-segmentation point 4
The scene of front and back is similar, thus may determine that scene pre-segmentation point 4 is caused by non-scene transformation itself.This kind of situation
Under, scene pre-segmentation point 4 can be deleted, so that 4 former and later two video clips of scene pre-segmentation point combine.
Step 207, the duration in the scene pre-segmentation point between every two adjacent scene pre-segmentation point is obtained, and is looked into
Duration is looked for be less than two adjacent scene pre-segmentation points of the minimum duration threshold value of setting, the two adjacent scenes that will be found
At least one of pre-segmentation point is deleted, and remaining scene pre-segmentation point is obtained.
In view of there may be the scenes that duration is shorter, such as a scene only to continue more than ten seconds in video to be processed,
According to the similarity between the corresponding scene characteristic vector of adjacent cells video to be processed, the start bit of the shorter scene of the duration
Scene pre-segmentation point can be also confirmed as with end position by setting, but be actually a video by the shorter scene cut of the duration
Segment has little significance.
For above situation, two phases that duration in scene pre-segmentation point set is less than the minimum duration threshold value of setting are searched
Adjacent scene pre-segmentation point deletes at least one of two adjacent scene pre-segmentation points found.When for minimum
Any suitable value, the embodiment of the present invention can be arranged in the specific value of long threshold value, those skilled in the art according to the actual situation
With no restriction to this.For example it is 10s, 15s, 20s etc. that minimum duration threshold value, which can be set,.
For example, when the duration between two adjacent scene pre-segmentation points 5 and scene pre-segmentation point 6 is less than the minimum of setting
Long threshold value can delete any one from scene pre-segmentation point 5 and scene pre-segmentation point 6 or two are deleted, so that field
Scene between scape pre-segmentation point 5 and scene pre-segmentation point 6 in conjunction with previous video segment, or with the latter video clip
In conjunction with, or in conjunction with former and later two video clips.
Above-mentioned steps 205, step 206 and step 207 are the optimization to scene pre-segmentation point, can in the embodiment of the present invention
With selection at least one step therein is executed, conditioning step does not execute sequence when being executed.
Step 208, scene is carried out to the video to be processed after deleting theme song segment according to remaining scene pre-segmentation point
The video clip that duration is more than the maximum time threshold value of setting is searched in segmentation from the video clip that scene cut obtains, as
Video clip to be split.
Scene pre-segmentation point is deleted by least one step in above-mentioned steps 205, step 206 and step 207
After removing, according to remaining each scene pre-segmentation point, scene cut is carried out to the video to be processed after deletion theme song segment, is obtained
To multiple video clips.
Step 209, it is searched from the audio pre-segmentation point nearest with the intermediate point of the frequency segment to be split distance
Audio pre-segmentation point carries out audio segmentation to the video clip to be split according to the audio pre-segmentation point found.
From the video clip that scene cut in step 208 obtains, the maximum time threshold value that duration is more than setting is searched
Video clip, as video clip to be split.It is pre- according to audio based on the angle of audio transformation for video clip to be split
Cut-point carries out further audio segmentation to it.
It is searched from the audio pre-segmentation point pre- apart from nearest audio with the intermediate point of the video clip to be split
Cut-point.Wherein, audio pre-segmentation point and the intermediate point of video clip to be split distance recently, can be divided in advance by calculating audio
The first distance of the starting point of cutpoint and video clip to be split and the end of audio pre-segmentation point and video clip to be split
The second distance of point, and calculates the ratio of first distance and second distance, audio segmentation point of the ratio closest to 1 be with to
Divide the intermediate point of video clip apart from nearest audio pre-segmentation point.
As shown in figure 3, by the curve graph in Fig. 3, (dotted line indicates that audio feature vector, solid line indicate that scene is special in curve graph
Levy vector) it is found that the scene determined according to the COS distance between the scene characteristic vector of two neighboring unit video to be processed
After pre-segmentation point is split, segment 2 and segment 3 are a video clips, such as the video of the centre shown at long video in Fig. 3
Segment, the duration of the video clip are greater than 7min, thus again according to the audio frequency characteristics of two neighboring unit video to be processed to
The audio pre-segmentation point that COS distance between amount determines is split the video clip to obtain segment 2 and segment 3.
Step 210, judge in video clip that audio segmentation obtains, if there are duration be more than the maximum time threshold value
Video clip.
After carrying out audio segmentation to video clip to be split according to the audio pre-segmentation point found, audio point is judged again
In the video clip cut, if there is also the video clips that duration is more than maximum time threshold value.It is more than if there is duration
The duration is more than then the video clip of the maximum time threshold value as wait divide by the video clip of the maximum time threshold value
Video clip is cut, and return step 209 continues to carry out audio segmentation to it according to audio segmentation point.It is more than if there is no duration
The video clip of the maximum time threshold value determines that segmentation terminates, executes step 211.
Step 211, segmentation terminates, and obtains the video clip that segmentation is completed.
The video clip that segmentation is completed is to the video clip after video demolition to be processed.
In the embodiment of the present invention, in such a way that scene pre-segmentation point and audio pre-segmentation point combine to video to be processed into
Row segmentation, avoids the longer problem of video clip duration obtained after only dividing by scene pre-segmentation point;By to scene
Pre-segmentation point set is filtered, and reduces the influence of the of short duration transformation of scene, and reduces light change and far and near scape transformation
Influence, guarantee segmentation after plot integrality;Processing effect is improved using multi-process parallel processing by process pool technology
Rate.
It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method
It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to
According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should
Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented
Necessary to example.
Referring to Fig. 4, a kind of structural block diagram of video process apparatus of the embodiment of the present invention is shown.
Video process apparatus of the embodiment of the present invention comprises the following modules:
The video to be processed is divided into multiple units view to be processed for obtaining video to be processed by division module 401
Frequently.
Module 402 is obtained, it is special for obtaining the corresponding scene characteristic vector sum audio of each unit video to be processed respectively
Levy vector.
Determining module 403, for determining field according to the corresponding scene characteristic vector of every two adjacent cells video to be processed
Scape pre-segmentation point, and audio pre-segmentation is determined according to the corresponding audio feature vector of every two adjacent cells video to be processed
Point.
Scene cut module 404, for carrying out scene cut to the video to be processed according to the scene pre-segmentation point.
First searching module 405, for searching the maximum that duration is more than setting from the video clip that scene cut obtains
The video clip of duration threshold value, as video clip to be split.
Audio segmentation module 406, for carrying out audio segmentation to video clip to be split according to the audio pre-segmentation point,
Obtain the video clip that segmentation is completed.
In the angle based on scene change in the embodiment of the present invention, field is carried out to video to be processed according to scene pre-segmentation point
After scape segmentation, it is based further on the angle of audio transformation to the longer video clip of duration, carries out sound according to audio pre-segmentation point
Frequency division is cut, so that avoiding being based only upon scene change carries out the too long problem of the video clip after demolition, improves the accuracy of demolition,
Preferably meet user demand.
Referring to Fig. 5, the structural block diagram of another video process apparatus of the embodiment of the present invention is shown.
Video process apparatus of the embodiment of the present invention comprises the following modules:
The video to be processed is divided into multiple units view to be processed for obtaining video to be processed by division module 501
Frequently.
Module 502 is obtained, it is special for obtaining the corresponding scene characteristic vector sum audio of each unit video to be processed respectively
Levy vector.
Determining module 503, for determining field according to the corresponding scene characteristic vector of every two adjacent cells video to be processed
Scape pre-segmentation point, and audio pre-segmentation is determined according to the corresponding audio feature vector of every two adjacent cells video to be processed
Point.
Scene cut module 504, for carrying out scene cut to the video to be processed according to the scene pre-segmentation point.
First searching module 505, for searching the maximum that duration is more than setting from the video clip that scene cut obtains
The video clip of duration threshold value, as video clip to be split.
Audio segmentation module 506, for carrying out audio to the video clip to be split according to the audio pre-segmentation point
Segmentation obtains the video clip that segmentation is completed.
In a kind of optional embodiment, the audio segmentation module 506 includes: audio segmentation point searching unit, is used for
The intermediate point with the video clip to be split is searched from the audio pre-segmentation point apart from nearest audio pre-segmentation point;Piece
Section cutting unit, for carrying out audio segmentation to the video clip to be split according to the audio pre-segmentation point found;Segment
Determination unit, for judging in video clip that audio segmentation obtains, if there are duration be more than the maximum time threshold value
Video clip;When there are the video clip that duration is more than the maximum time threshold value, when being more than the maximum for the duration
The video clip of long threshold value calls the cut-point searching unit as video clip to be split.
In a kind of optional embodiment, described device further include: the second searching module 507, for pre- from the scene
The scene pre-segmentation point of the of short duration transformation of scene is searched in cut-point;First removing module 508, for deleting the of short duration change of the scene
The scene pre-segmentation point changed, obtains remaining scene pre-segmentation point;The scene cut module 504 is specifically used for according to described
Remaining scene pre-segmentation point carries out scene cut to the video to be processed.
In a kind of optional embodiment, second searching module 507 includes: scene cut point searching unit, is used for
The duration in the scene pre-segmentation point between every two adjacent scene pre-segmentation point is obtained, and searches duration and is less than setting
Two adjacent scene pre-segmentation points of scene change threshold value;First video acquisition unit, for obtaining find two phases
In adjacent scene pre-segmentation point, at least one unit video to be processed and the latter scene before previous scene pre-segmentation point are pre-
At least one unit video to be processed after cut-point;First similarity calculated waits locating for calculating the unit obtained
It manages in video, the similarity of the corresponding scene characteristic vector of every two adjacent cells video to be processed;First mean value calculation list
Member, for calculating average value according to each similarity;First cut-point determination unit is big for the average value in the similarity
When preset scene similarity threshold, at least one of two adjacent scene pre-segmentation points found are determined as field
The scene pre-segmentation point of the of short duration transformation of scape.
In a kind of optional embodiment, described device further include: third searching module 509, for pre- from the scene
The scene pre-segmentation point that non-scene itself converts is searched in cut-point;Second removing module 510, for deleting the non-scene sheet
The scene pre-segmentation point of body transformation, obtains remaining scene pre-segmentation point;The scene cut module 504, be specifically used for according to
The remaining scene pre-segmentation point carries out scene cut to the video to be processed.
In a kind of optional embodiment, the third searching module 509 includes: the second video acquisition unit, is used for needle
To each scene pre-segmentation point in the scene pre-segmentation point, at least one unit before current scene pre-segmentation point is obtained
Video to be processed and at least one unit video to be processed later;Histogram determination unit, for determining that the unit obtained waits for
It handles in video, the corresponding histogram feature vector of each unit video to be processed;Second similarity calculated, for calculating
In the unit of acquisition video to be processed, the similarity of the corresponding histogram feature vector of every two adjacent cells video to be processed;
Second average calculation unit, for calculating average value according to each similarity;Second cut-point determination unit, for described
When the average value of similarity is greater than preset histogram similarity threshold, the current scene pre-segmentation point is determined as non-scene
The scene pre-segmentation point of transformation itself.
In a kind of optional embodiment, described device further include: the 4th searching module 511, for obtaining the scene
Duration in pre-segmentation point between every two adjacent scene pre-segmentation point, and search the minimum duration threshold value that duration is less than setting
Two adjacent scene pre-segmentation points;Third removing module 512, two adjacent scene pre-segmentation points for will find
At least one of delete, obtain remaining scene pre-segmentation point;The scene cut module 504 is specifically used for according to described
Remaining scene pre-segmentation point carries out scene cut to the video to be processed.
In a kind of optional embodiment, described device further include: detection module 513, for detecting the view to be processed
Theme song segment in frequency;4th removing module 514, for deleting the theme song segment from the video to be processed;Institute
Division module 501 is stated, waits locating specifically for the video to be processed for deleting after the theme song segment is divided into multiple units
Manage video.
In a kind of optional embodiment, the acquisition module 502 includes: call unit, for simultaneously calling first into
Journey and the second process;Scene characteristic acquiring unit, for obtaining each unit video to be processed respectively using first process
Corresponding scene characteristic vector;Audio frequency characteristics acquiring unit waits locating for obtaining each unit respectively using second process
Manage the corresponding audio feature vector of video.
In the embodiment of the present invention, in such a way that scene pre-segmentation point and audio pre-segmentation point combine to video to be processed into
Row segmentation, avoids the longer problem of video clip duration obtained after only dividing by scene pre-segmentation point;By to scene
Pre-segmentation point set is filtered, and reduces the influence of the of short duration transformation of scene, and reduces light change and far and near scape transformation
Influence, guarantee segmentation after plot integrality;Processing effect is improved using multi-process parallel processing by process pool technology
Rate.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple
Place illustrates referring to the part of embodiment of the method.
In an embodiment of the present invention, a kind of electronic equipment for video processing is additionally provided.The electronic equipment can be with
Including one or more processors, and for the memory of storage processor executable instruction, executable instruction for example using
Program.Processor is configured as executing above-mentioned method for processing video frequency.
In an embodiment of the present invention, a kind of non-transitorycomputer readable storage medium including instruction is additionally provided,
Memory for example including instruction, above-metioned instruction can be executed by the processor of electronic equipment, to complete above-mentioned video processing side
Method.For example, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, magnetic
Band, floppy disk and optical data storage devices etc..
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with
The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can provide as method, apparatus or calculate
Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and
The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can
With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code
The form of the computer program product of implementation.
The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, terminal device (system) and computer program
The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions
In each flow and/or block and flowchart and/or the block diagram in process and/or box combination.It can provide these
Computer program instructions are set to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals
Standby processor is to generate a machine, so that being held by the processor of computer or other programmable data processing terminal devices
Capable instruction generates for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram
The device of specified function.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminal devices
In computer-readable memory operate in a specific manner, so that instruction stored in the computer readable memory generates packet
The manufacture of command device is included, which realizes in one side of one or more flows of the flowchart and/or block diagram
The function of being specified in frame or multiple boxes.
These computer program instructions can also be loaded into computer or other programmable data processing terminal devices, so that
Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus
The instruction executed on computer or other programmable terminal equipments is provided for realizing in one or more flows of the flowchart
And/or in one or more blocks of the block diagram specify function the step of.
Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases
This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as
Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap
Those elements are included, but also including other elements that are not explicitly listed, or further includes for this process, method, article
Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited
Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device for including the element.
Above to a kind of method for processing video frequency provided by the present invention, device, electronic equipment and storage medium, carry out in detail
Thin to introduce, used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said
It is bright to be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, foundation
Thought of the invention, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification is not
It is interpreted as limitation of the present invention.
Claims (20)
1. a kind of method for processing video frequency, which is characterized in that the described method includes:
Video to be processed is obtained, the video to be processed is divided into multiple units video to be processed;
The corresponding scene characteristic vector sum audio feature vector of each unit video to be processed is obtained respectively;
Scene pre-segmentation point is determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed, and according to every
The corresponding audio feature vector of two adjacent cells videos to be processed determines audio pre-segmentation point;
Scene cut, the video clip obtained from scene cut are carried out to the video to be processed according to the scene pre-segmentation point
It is middle to search the video clip that duration is more than the maximum time threshold value of setting, as video clip to be split;
Audio segmentation is carried out to the video clip to be split according to the audio pre-segmentation point, obtains the piece of video that segmentation is completed
Section.
2. the method according to claim 1, wherein it is described according to the audio pre-segmentation point to described to be split
Video clip carries out audio segmentation, comprising:
The intermediate point with the video clip to be split is searched from the audio pre-segmentation point apart from nearest audio pre-segmentation
Point;
Audio segmentation is carried out to the video clip to be split according to the audio pre-segmentation point found;
Judge in video clip that audio segmentation obtains, if there are the video clips that duration is more than the maximum time threshold value;
It is more than the maximum time threshold value by the duration when there are the video clip that duration is more than the maximum time threshold value
Video clip as video clip to be split, return execute it is described searched from the audio pre-segmentation point with it is described to be split
The step of intermediate point of video clip is apart from nearest audio pre-segmentation point.
3. the method according to claim 1, wherein
It is described scene pre-segmentation point is determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed after, also
Include:
The scene pre-segmentation point of the of short duration transformation of scene is searched from the scene pre-segmentation point;
The scene pre-segmentation point for deleting the of short duration transformation of the scene, obtains remaining scene pre-segmentation point;
It is described that scene cut is carried out to the video to be processed according to the scene pre-segmentation point, comprising:
Scene cut is carried out to the video to be processed according to the remaining scene pre-segmentation point.
4. according to the method described in claim 3, it is characterized in that, the lookup scene from the scene pre-segmentation point is of short duration
The scene pre-segmentation point of transformation, comprising:
Obtain the duration in the scene pre-segmentation point between every two adjacent scene pre-segmentation point, and search duration be less than set
Two adjacent scene pre-segmentation points of fixed scene change threshold value;
It obtains in the two adjacent scene pre-segmentation points found, at least one unit before previous scene pre-segmentation point
At least one unit video to be processed after video to be processed and the latter scene pre-segmentation point;
It calculates in the unit video to be processed obtained, the phase of the corresponding scene characteristic vector of every two adjacent cells video to be processed
Like degree;
According to each similarity, average value is calculated;
It is when the average value of the similarity is greater than preset scene similarity threshold, the two adjacent scenes found are pre-
At least one of cut-point is determined as the scene pre-segmentation point of the of short duration transformation of scene.
5. the method according to claim 1, wherein
It is described scene pre-segmentation point is determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed after, also
Include:
The scene pre-segmentation point that non-scene itself converts is searched from the scene pre-segmentation point;
The scene pre-segmentation point that the non-scene itself converts is deleted, remaining scene pre-segmentation point is obtained;
It is described that scene cut is carried out to the video to be processed according to the scene pre-segmentation point, comprising:
Scene cut is carried out to the video to be processed according to the remaining scene pre-segmentation point.
6. according to the method described in claim 5, it is characterized in that, described search non-scene sheet from the scene pre-segmentation point
The scene pre-segmentation point of body transformation, comprising:
For each scene pre-segmentation point, at least one unit video to be processed before current scene pre-segmentation point and it are obtained
At least one unit video to be processed afterwards;
It determines in the unit video to be processed obtained, the corresponding histogram feature vector of each unit video to be processed;
It calculates in the unit video to be processed obtained, the corresponding histogram feature vector of every two adjacent cells video to be processed
Similarity;
According to each similarity, average value is calculated;
It is when the average value of the similarity is greater than preset histogram similarity threshold, the current scene pre-segmentation point is true
It is set to the scene pre-segmentation point that non-scene itself converts.
7. the method according to claim 1, wherein
It is described scene pre-segmentation point is determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed after, also
Include:
Obtain the duration in the scene pre-segmentation point between every two adjacent scene pre-segmentation point, and search duration be less than set
Two adjacent scene pre-segmentation points of fixed minimum duration threshold value;
At least one of two adjacent scene pre-segmentation points found are deleted, remaining scene pre-segmentation point is obtained;
It is described that scene cut is carried out to the video to be processed according to the scene pre-segmentation point, comprising:
Scene cut is carried out to the video to be processed according to the remaining scene pre-segmentation point.
8. the method according to claim 1, wherein
It is described that the video to be processed is divided into before multiple units video to be processed, further includes:
The theme song segment in the video to be processed is detected, and deletes the theme song segment from the video to be processed;
It is described that the video to be processed is divided into multiple units video to be processed, comprising:
The video to be processed deleted after the theme song segment is divided into multiple units video to be processed.
9. the method according to claim 1, wherein described, to obtain each unit video to be processed respectively corresponding
Scene characteristic vector sum audio feature vector, comprising:
The first process and the second process are called simultaneously;
The corresponding scene characteristic vector of each unit video to be processed is obtained respectively using first process;
The corresponding audio feature vector of each unit video to be processed is obtained respectively using second process.
10. a kind of video process apparatus, which is characterized in that described device includes:
The video to be processed is divided into multiple units video to be processed for obtaining video to be processed by division module;
Module is obtained, for obtaining the corresponding scene characteristic vector sum audio feature vector of each unit video to be processed respectively;
Determining module, for determining scene pre-segmentation according to the corresponding scene characteristic vector of every two adjacent cells video to be processed
Point, and audio pre-segmentation point is determined according to the corresponding audio feature vector of every two adjacent cells video to be processed;
Scene cut module, for carrying out scene cut to the video to be processed according to the scene pre-segmentation point;
First searching module, for searching the maximum time threshold value that duration is more than setting from the video clip that scene cut obtains
Video clip, as video clip to be split;
Audio segmentation module is obtained for carrying out audio segmentation to the video clip to be split according to the audio pre-segmentation point
The video clip completed to segmentation.
11. device according to claim 10, which is characterized in that the audio segmentation module includes:
Audio segmentation point searching unit, for searching the centre with the video clip to be split from the audio pre-segmentation point
Point is apart from nearest audio pre-segmentation point;
Fragment segmentation unit, for carrying out audio minute to the video clip to be split according to the audio pre-segmentation point found
It cuts;
Segment determination unit, for judging in video clip that audio segmentation obtains, if when there are duration being more than described maximum
The video clip of long threshold value;It is more than institute by the duration when there are the video clip that duration is more than the maximum time threshold value
The video clip of maximum time threshold value is stated as video clip to be split, and calls the cut-point searching unit.
12. device according to claim 10, which is characterized in that described device further include:
Second searching module, for searching the scene pre-segmentation point of the of short duration transformation of scene from the scene pre-segmentation point;
First removing module obtains remaining scene pre-segmentation for deleting the scene pre-segmentation point of the of short duration transformation of the scene
Point;
The scene cut module is specifically used for carrying out field to the video to be processed according to the remaining scene pre-segmentation point
Scape segmentation.
13. device according to claim 12, which is characterized in that second searching module includes:
Scene cut point searching unit, for obtaining in the scene pre-segmentation point between every two adjacent scene pre-segmentation point
Duration, and search duration be less than setting scene change threshold value two adjacent scene pre-segmentation points;
First video acquisition unit, for obtaining in find two adjacent scene pre-segmentation points, previous scene pre-segmentation
At least one unit view to be processed after at least one unit video to be processed and the latter scene pre-segmentation point before point
Frequently;
First similarity calculated, for calculating in the unit video to be processed obtained, every two adjacent cells view to be processed
Frequently the similarity of corresponding scene characteristic vector;
First average calculation unit, for calculating average value according to each similarity;
First cut-point determination unit will when being greater than preset scene similarity threshold for the average value in the similarity
At least one of two adjacent scene pre-segmentation points found are determined as the scene pre-segmentation point of the of short duration transformation of scene.
14. device according to claim 10, which is characterized in that described device further include:
Third searching module, the scene pre-segmentation point converted for searching non-scene itself from the scene pre-segmentation point;
Second removing module, the scene pre-segmentation point converted for deleting the non-scene itself, obtains remaining scene and divides in advance
Cutpoint;
The scene cut module is specifically used for carrying out field to the video to be processed according to the remaining scene pre-segmentation point
Scape segmentation.
15. device according to claim 14, which is characterized in that the third searching module includes:
Second video acquisition unit, for for each scene pre-segmentation point in the scene pre-segmentation point, front court to be worked as in acquisition
At least one unit video to be processed before scape pre-segmentation point and at least one unit video to be processed later;
Histogram determination unit, for determining in the unit video to be processed obtained, each unit video to be processed is corresponding straight
Square figure feature vector;
Second similarity calculated, for calculating in the unit video to be processed obtained, every two adjacent cells view to be processed
Frequently the similarity of corresponding histogram feature vector;
Second average calculation unit, for calculating average value according to each similarity;
Second cut-point determination unit, when being greater than preset histogram similarity threshold for the average value in the similarity,
The current scene pre-segmentation point is determined as the scene pre-segmentation point that non-scene itself converts.
16. device according to claim 10, which is characterized in that described device further include:
4th searching module, for obtain in the scene pre-segmentation point between every two adjacent scene pre-segmentation point when
It is long, and search two adjacent scene pre-segmentation points that duration is less than the minimum duration threshold value of setting;
Third removing module is remained for deleting at least one of two adjacent scene pre-segmentation points found
Remaining scene pre-segmentation point;
The scene cut module is specifically used for carrying out field to the video to be processed according to the remaining scene pre-segmentation point
Scape segmentation.
17. device according to claim 10, which is characterized in that described device further include:
Detection module, for detecting the theme song segment in the video to be processed;
4th removing module, for deleting the theme song segment from the video to be processed;
The division module is waited for specifically for the video to be processed for deleting after the theme song segment is divided into multiple units
Handle video.
18. device according to claim 10, which is characterized in that the acquisition module includes:
Call unit is used for while calling the first process and the second process;
Scene characteristic acquiring unit, for obtaining the corresponding scene of each unit video to be processed respectively using first process
Feature vector;
Audio frequency characteristics acquiring unit, for obtaining the corresponding audio of each unit video to be processed respectively using second process
Feature vector.
19. a kind of electronic equipment characterized by comprising
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to executing such as the described in any item method for processing video frequency of claim 1-9.
20. a kind of non-transitorycomputer readable storage medium, which is characterized in that when the instruction in the storage medium is by electronics
When the processor of equipment executes, so that electronic equipment is able to carry out such as the described in any item method for processing video frequency of claim 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910472453.7A CN110213670B (en) | 2019-05-31 | 2019-05-31 | Video processing method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910472453.7A CN110213670B (en) | 2019-05-31 | 2019-05-31 | Video processing method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110213670A true CN110213670A (en) | 2019-09-06 |
CN110213670B CN110213670B (en) | 2022-01-07 |
Family
ID=67790245
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910472453.7A Active CN110213670B (en) | 2019-05-31 | 2019-05-31 | Video processing method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110213670B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111400615A (en) * | 2020-03-19 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Resource recommendation method, device, equipment and storage medium |
CN111601162A (en) * | 2020-06-08 | 2020-08-28 | 北京世纪好未来教育科技有限公司 | Video segmentation method and device and computer storage medium |
CN111641869A (en) * | 2020-06-04 | 2020-09-08 | 虎博网络技术(上海)有限公司 | Video split mirror method, video split mirror device, electronic equipment and computer readable storage medium |
CN112100436A (en) * | 2020-09-29 | 2020-12-18 | 新东方教育科技集团有限公司 | Dance segment recognition method, dance segment recognition device and storage medium |
CN113259761A (en) * | 2020-02-13 | 2021-08-13 | 华为技术有限公司 | Video processing method, video processing apparatus, and storage medium |
CN113438500A (en) * | 2020-03-23 | 2021-09-24 | 阿里巴巴集团控股有限公司 | Video processing method and device, electronic equipment and computer storage medium |
CN113435328A (en) * | 2021-06-25 | 2021-09-24 | 上海众源网络有限公司 | Video clip processing method and device, electronic equipment and readable storage medium |
CN113569704A (en) * | 2021-07-23 | 2021-10-29 | 上海明略人工智能(集团)有限公司 | Division point judgment method, system, storage medium and electronic device |
CN113569703A (en) * | 2021-07-23 | 2021-10-29 | 上海明略人工智能(集团)有限公司 | Method and system for judging true segmentation point, storage medium and electronic equipment |
CN113569706A (en) * | 2021-07-23 | 2021-10-29 | 上海明略人工智能(集团)有限公司 | Video scene segmentation point judgment method and system, storage medium and electronic equipment |
CN113810782A (en) * | 2020-06-12 | 2021-12-17 | 阿里巴巴集团控股有限公司 | Video processing method and device, server and electronic device |
CN113992970A (en) * | 2020-07-27 | 2022-01-28 | 阿里巴巴集团控股有限公司 | Video data processing method and device, electronic equipment and computer storage medium |
CN114222159A (en) * | 2021-12-01 | 2022-03-22 | 北京奇艺世纪科技有限公司 | Method and system for determining video scene change point and generating video clip |
CN114299074A (en) * | 2021-12-14 | 2022-04-08 | 北京达佳互联信息技术有限公司 | Video segmentation method, device, equipment and storage medium |
CN115086759A (en) * | 2022-05-13 | 2022-09-20 | 北京达佳互联信息技术有限公司 | Video processing method, video processing device, computer equipment and medium |
CN116546264A (en) * | 2023-04-10 | 2023-08-04 | 北京度友信息技术有限公司 | Video processing method and device, electronic equipment and storage medium |
CN117499739A (en) * | 2024-01-02 | 2024-02-02 | 腾讯科技(深圳)有限公司 | Frame rate control method, device, computer equipment and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1938714A (en) * | 2004-03-23 | 2007-03-28 | 英国电讯有限公司 | Method and system for semantically segmenting scenes of a video sequence |
CN102170528A (en) * | 2011-03-25 | 2011-08-31 | 天脉聚源(北京)传媒科技有限公司 | Segmentation method of news program |
CN102685398A (en) * | 2011-09-06 | 2012-09-19 | 天脉聚源(北京)传媒科技有限公司 | News video scene generating method |
CN102890778A (en) * | 2011-07-21 | 2013-01-23 | 北京新岸线网络技术有限公司 | Content-based video detection method and device |
US20140150043A1 (en) * | 2012-11-23 | 2014-05-29 | Institute For Information Industry | Scene fragment transmitting system, scene fragment transmitting method and recording medium |
CN104519401A (en) * | 2013-09-30 | 2015-04-15 | 华为技术有限公司 | Video division point acquiring method and equipment |
CN106021496A (en) * | 2016-05-19 | 2016-10-12 | 海信集团有限公司 | Video search method and video search device |
CN108307229A (en) * | 2018-02-02 | 2018-07-20 | 新华智云科技有限公司 | A kind of processing method and equipment of video-audio data |
CN108376147A (en) * | 2018-01-24 | 2018-08-07 | 北京览科技有限公司 | A kind of method and apparatus for obtaining the evaluation result information of video |
CN109344780A (en) * | 2018-10-11 | 2019-02-15 | 上海极链网络科技有限公司 | A kind of multi-modal video scene dividing method based on sound and vision |
-
2019
- 2019-05-31 CN CN201910472453.7A patent/CN110213670B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1938714A (en) * | 2004-03-23 | 2007-03-28 | 英国电讯有限公司 | Method and system for semantically segmenting scenes of a video sequence |
CN102170528A (en) * | 2011-03-25 | 2011-08-31 | 天脉聚源(北京)传媒科技有限公司 | Segmentation method of news program |
CN102890778A (en) * | 2011-07-21 | 2013-01-23 | 北京新岸线网络技术有限公司 | Content-based video detection method and device |
CN102685398A (en) * | 2011-09-06 | 2012-09-19 | 天脉聚源(北京)传媒科技有限公司 | News video scene generating method |
US20140150043A1 (en) * | 2012-11-23 | 2014-05-29 | Institute For Information Industry | Scene fragment transmitting system, scene fragment transmitting method and recording medium |
CN104519401A (en) * | 2013-09-30 | 2015-04-15 | 华为技术有限公司 | Video division point acquiring method and equipment |
CN106021496A (en) * | 2016-05-19 | 2016-10-12 | 海信集团有限公司 | Video search method and video search device |
CN108376147A (en) * | 2018-01-24 | 2018-08-07 | 北京览科技有限公司 | A kind of method and apparatus for obtaining the evaluation result information of video |
CN108307229A (en) * | 2018-02-02 | 2018-07-20 | 新华智云科技有限公司 | A kind of processing method and equipment of video-audio data |
CN109344780A (en) * | 2018-10-11 | 2019-02-15 | 上海极链网络科技有限公司 | A kind of multi-modal video scene dividing method based on sound and vision |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113259761A (en) * | 2020-02-13 | 2021-08-13 | 华为技术有限公司 | Video processing method, video processing apparatus, and storage medium |
WO2021159896A1 (en) * | 2020-02-13 | 2021-08-19 | 华为技术有限公司 | Video processing method, video processing device, and storage medium |
CN113259761B (en) * | 2020-02-13 | 2022-08-26 | 华为技术有限公司 | Video processing method, video processing apparatus, and storage medium |
CN111400615A (en) * | 2020-03-19 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Resource recommendation method, device, equipment and storage medium |
CN113438500A (en) * | 2020-03-23 | 2021-09-24 | 阿里巴巴集团控股有限公司 | Video processing method and device, electronic equipment and computer storage medium |
CN111641869A (en) * | 2020-06-04 | 2020-09-08 | 虎博网络技术(上海)有限公司 | Video split mirror method, video split mirror device, electronic equipment and computer readable storage medium |
CN111601162A (en) * | 2020-06-08 | 2020-08-28 | 北京世纪好未来教育科技有限公司 | Video segmentation method and device and computer storage medium |
CN111601162B (en) * | 2020-06-08 | 2022-08-02 | 北京世纪好未来教育科技有限公司 | Video segmentation method and device and computer storage medium |
CN113810782A (en) * | 2020-06-12 | 2021-12-17 | 阿里巴巴集团控股有限公司 | Video processing method and device, server and electronic device |
CN113810782B (en) * | 2020-06-12 | 2022-09-27 | 阿里巴巴集团控股有限公司 | Video processing method and device, server and electronic device |
CN113992970A (en) * | 2020-07-27 | 2022-01-28 | 阿里巴巴集团控股有限公司 | Video data processing method and device, electronic equipment and computer storage medium |
US11837028B2 (en) | 2020-09-29 | 2023-12-05 | New Oriental Education & Technology Group Inc. | Dance segment recognition method, dance segment recognition apparatus, and storage medium |
CN112100436A (en) * | 2020-09-29 | 2020-12-18 | 新东方教育科技集团有限公司 | Dance segment recognition method, dance segment recognition device and storage medium |
CN112100436B (en) * | 2020-09-29 | 2021-07-06 | 新东方教育科技集团有限公司 | Dance segment recognition method, dance segment recognition device and storage medium |
CN113435328A (en) * | 2021-06-25 | 2021-09-24 | 上海众源网络有限公司 | Video clip processing method and device, electronic equipment and readable storage medium |
CN113435328B (en) * | 2021-06-25 | 2024-05-31 | 上海众源网络有限公司 | Video clip processing method and device, electronic equipment and readable storage medium |
CN113569706B (en) * | 2021-07-23 | 2024-03-01 | 上海明略人工智能(集团)有限公司 | Video scene segmentation point judging method, system, storage medium and electronic equipment |
CN113569704A (en) * | 2021-07-23 | 2021-10-29 | 上海明略人工智能(集团)有限公司 | Division point judgment method, system, storage medium and electronic device |
CN113569704B (en) * | 2021-07-23 | 2023-12-12 | 上海明略人工智能(集团)有限公司 | Segmentation point judging method, system, storage medium and electronic equipment |
CN113569706A (en) * | 2021-07-23 | 2021-10-29 | 上海明略人工智能(集团)有限公司 | Video scene segmentation point judgment method and system, storage medium and electronic equipment |
CN113569703B (en) * | 2021-07-23 | 2024-04-16 | 上海明略人工智能(集团)有限公司 | Real division point judging method, system, storage medium and electronic equipment |
CN113569703A (en) * | 2021-07-23 | 2021-10-29 | 上海明略人工智能(集团)有限公司 | Method and system for judging true segmentation point, storage medium and electronic equipment |
CN114222159A (en) * | 2021-12-01 | 2022-03-22 | 北京奇艺世纪科技有限公司 | Method and system for determining video scene change point and generating video clip |
CN114299074A (en) * | 2021-12-14 | 2022-04-08 | 北京达佳互联信息技术有限公司 | Video segmentation method, device, equipment and storage medium |
CN115086759A (en) * | 2022-05-13 | 2022-09-20 | 北京达佳互联信息技术有限公司 | Video processing method, video processing device, computer equipment and medium |
CN116546264A (en) * | 2023-04-10 | 2023-08-04 | 北京度友信息技术有限公司 | Video processing method and device, electronic equipment and storage medium |
CN117499739A (en) * | 2024-01-02 | 2024-02-02 | 腾讯科技(深圳)有限公司 | Frame rate control method, device, computer equipment and storage medium |
CN117499739B (en) * | 2024-01-02 | 2024-06-07 | 腾讯科技(深圳)有限公司 | Frame rate control method, device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110213670B (en) | 2022-01-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110213670A (en) | Method for processing video frequency, device, electronic equipment and storage medium | |
US20180374491A1 (en) | Systems and Methods for Recognizing Sound and Music Signals in High Noise and Distortion | |
EP1081960B1 (en) | Signal processing method and video/voice processing device | |
CN101819638B (en) | Establishment method of pornographic detection model and pornographic detection method | |
CN105224581B (en) | The method and apparatus of picture are presented when playing music | |
CN107562760B (en) | Voice data processing method and device | |
CN114297439B (en) | Short video tag determining method, system, device and storage medium | |
WO2023197979A1 (en) | Data processing method and apparatus, and computer device and storage medium | |
US20170140226A1 (en) | Apparatus and method for identifying a still image contained in moving image contents | |
JP2003259302A (en) | Method for automatically producing music video, product including information storage medium for storing information, and program | |
CN103729368B (en) | A kind of robust audio recognition methods based on local spectrum iamge description | |
US20150128788A1 (en) | Method, device and system for automatically adjusting a duration of a song | |
CN110324726B (en) | Model generation method, video processing method, model generation device, video processing device, electronic equipment and storage medium | |
US20130266147A1 (en) | System and method for identification of highly-variable vocalizations | |
CN115359409B (en) | Video splitting method and device, computer equipment and storage medium | |
CN110324657A (en) | Model generation, method for processing video frequency, device, electronic equipment and storage medium | |
CN111510765A (en) | Audio label intelligent labeling method and device based on teaching video | |
CN107066488A (en) | Video display bridge section automatic division method based on movie and television contents semantic analysis | |
KR101634068B1 (en) | Method and device for generating educational contents map | |
Felipe et al. | Acoustic scene classification using spectrograms | |
CN116567351B (en) | Video processing method, device, equipment and medium | |
CN117609548A (en) | Video multi-mode target element extraction and video abstract synthesis method and system based on pre-training model | |
CN110555117B (en) | Data processing method and device and electronic equipment | |
CN110516086B (en) | Method for automatically acquiring movie label based on deep neural network | |
US9445210B1 (en) | Waveform display control of visual characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |